pip install xgboost
Requirement already satisfied: xgboost in /Users/alperenunal/anaconda3/lib/python3.11/site-packages (2.0.2) Requirement already satisfied: numpy in /Users/alperenunal/anaconda3/lib/python3.11/site-packages (from xgboost) (1.24.3) Requirement already satisfied: scipy in /Users/alperenunal/anaconda3/lib/python3.11/site-packages (from xgboost) (1.11.1) Note: you may need to restart the kernel to use updated packages.
import pandas as pd
import numpy as np
import plotly.express as px
import pandas as pd
import plotly.graph_objects as go
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.inspection import permutation_importance
from sklearn.feature_selection import mutual_info_regression
from sklearn.preprocessing import MinMaxScaler
from xgboost import XGBRegressor
from xgboost import plot_importance
Loading fetched datasets via API from VeloData
btcusdt_spot = pd.read_csv("btcusdt_spot_velo_1hour_010121.csv")
ethusdt_spot = pd.read_csv("ethusdt_spot_velo_1hour_010121.csv")
btcusdt_futures = pd.read_csv("btcusdt_futures_velo_1hour_010121.csv")
ethusdt_futures = pd.read_csv("ethusdt_futures_velo_1hour_010121.csv")
The ETF dataset was created by identifying ETF names listed on the Blockworks website. The announcement date for each ETF event was retrieved from the official websites of the respective ETF issuers and manually added to the dataset to accurately track ETF events in the cryptocurrency market. The dataset contains important dates and relevant information about various ETFs and associated cryptocurrencies (BTC or ETH).
# Loading manually created ETF data
etf_df = pd.read_csv('ETF_List_sorted.csv')
etf_df
| timestamp | Company | Currency | Trade | |
|---|---|---|---|---|
| 0 | 2017-11-22 | Bitwise 10 Crypto Index Fund | ETH | Spot |
| 1 | 2021-03-26 | VanEck Ethereum Strategy ETF | ETH | Spot |
| 2 | 2021-10-21 | Valkyrie Bitcoin and Ether Strategy ETF | ETH | Futures |
| 3 | 2021-11-15 | Global X Blockchain & Bitcoin Strategy ETF | BTC | Futures |
| 4 | 2021-11-18 | ProShares Bitcoin Strategy ETF | BTC | Futures |
| 5 | 2021-11-21 | Valkyrie Bitcoin and Ether Strategy ETF | BTC | Futures |
| 6 | 2022-09-15 | Hashdex Bitcoin Futures ETF | BTC | Futures |
| 7 | 2023-02-10 | ProShares Ether Strategy ETF | ETH | Spot |
| 8 | 2023-03-20 | Bitwise Bitcoin Strategy Optimum Yield ETF | BTC | Futures |
| 9 | 2023-09-29 | Bitwise Ethereum Strategy ETF | ETH | Spot |
| 10 | 2023-09-29 | Bitwise Bitcoin and Ether Equal Weight Strateg... | ETH | Spot |
| 11 | 2023-10-02 | ProShares Bitcoin & Ether Equal Weight Strateg... | BTC | Futures |
| 12 | 2023-10-02 | ProShares Bitcoin & Ether Market Cap Weight St... | ETH | Futures |
| 13 | 2023-10-02 | ProShares Bitcoin & Ether Equal Weight Strateg... | ETH | Futures |
| 14 | 2023-10-02 | ProShares Bitcoin & Ether Market Cap Weight St... | BTC | Futures |
| 15 | 2023-11-14 | ARK 21Shares Active Ethereum Futures Strategy | ETH | Spot |
| 16 | 2023-11-14 | ARK 21Shares Active Bitcoin Futures Strategy ETF | BTC | Futures |
| 17 | 2023-11-15 | ARK 21Shares Active On-Chain Bitcoin Strategy ETF | BTC | Futures |
| 18 | 2023-11-15 | ARK 21Shares Active Bitcoin Ethereum Strategy ETF | BTC | Futures |
| 19 | 2024-01-05 | iShares Bitcoin Trust | BTC | Spot |
| 20 | 2024-01-10 | Bitwise Bitcoin ETP | BTC | Spot |
| 21 | 2024-01-10 | Ark/21 Shares Bitcoin Trust | BTC | Spot |
| 22 | 2024-01-10 | Valkyrie Bitcoin Fund | BTC | Spot |
| 23 | 2024-01-11 | Invesco Galaxy Bitcoin ETF | BTC | Spot |
| 24 | 2024-01-11 | VanEck Bitcoin Trust | BTC | Spot |
| 25 | 2024-01-11 | WisdomTree Bitcoin Trust | BTC | Spot |
| 26 | 2024-01-11 | Franklin Bitcoin ETF | BTC | Spot |
| 27 | 2024-01-11 | Wise Origin Bitcoin Trust by Fidelity | BTC | Spot |
| 28 | 2024-01-11 | Grayscale Bitcoin Trust | BTC | Spot |
df = etf_df
df.drop(columns=['Company', 'Trade'], inplace=True)
# The first row was removed because it was an ETF event that
# took place before the start date of the historical data we will analyze.
df = df.iloc[1:].reset_index(drop=True)
# Converting 'timestamp' column to datetime format.
df['timestamp'] = pd.to_datetime(df['timestamp'])
# Creating 'btc_etf' and 'eth_etf' columns based on 'Currency' column.
df['btc_etf'] = df['Currency'].apply(lambda x: 1 if x == 'BTC' else 0)
df['eth_etf'] = df['Currency'].apply(lambda x: 1 if x == 'ETH' else 0)
# Filtering data between the specified start and end dates.
start_date = '2021-01-01 00:00:00'
end_date = '2024-07-30 18:00:00'
df = df[(df['timestamp'] >= start_date) & (df['timestamp'] <= end_date)]
# The 'timestamp' column is unique by adding a small delta to duplicates.
df['timestamp'] += pd.to_timedelta(df.groupby('timestamp').cumcount(), unit='s')
# Resampling to hourly frequency with all missing hours filled with 0.
all_hours = pd.date_range(start=start_date, end=end_date, freq='H')
df = df.set_index('timestamp').reindex(all_hours, fill_value=0).reset_index()
df.rename(columns={'index': 'timestamp'}, inplace=True)
# For each day where btc_etf or eth_etf was 1, it remains 1 for all hours.
df['btc_etf'] = df.groupby(df['timestamp'].dt.date)['btc_etf'].transform('max')
df['eth_etf'] = df.groupby(df['timestamp'].dt.date)['eth_etf'].transform('max')
# 'timestamp' column to 'YYYY-MM-DD hh:mm:ss'
df['timestamp'] = df['timestamp'].dt.strftime('%Y-%m-%d %H:%M:%S')
df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 31363 entries, 0 to 31362 Data columns (total 4 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 timestamp 31363 non-null object 1 Currency 31363 non-null object 2 btc_etf 31363 non-null int64 3 eth_etf 31363 non-null int64 dtypes: int64(2), object(2) memory usage: 980.2+ KB
df['timestamp'] = pd.to_datetime(df['timestamp'])
df
| timestamp | Currency | btc_etf | eth_etf | |
|---|---|---|---|---|
| 0 | 2021-01-01 00:00:00 | 0 | 0 | 0 |
| 1 | 2021-01-01 01:00:00 | 0 | 0 | 0 |
| 2 | 2021-01-01 02:00:00 | 0 | 0 | 0 |
| 3 | 2021-01-01 03:00:00 | 0 | 0 | 0 |
| 4 | 2021-01-01 04:00:00 | 0 | 0 | 0 |
| ... | ... | ... | ... | ... |
| 31358 | 2024-07-30 14:00:00 | 0 | 0 | 0 |
| 31359 | 2024-07-30 15:00:00 | 0 | 0 | 0 |
| 31360 | 2024-07-30 16:00:00 | 0 | 0 | 0 |
| 31361 | 2024-07-30 17:00:00 | 0 | 0 | 0 |
| 31362 | 2024-07-30 18:00:00 | 0 | 0 | 0 |
31363 rows × 4 columns
df.drop(columns='Currency', inplace=True)
df.rename(columns={'timestamp': 'time'}, inplace=True)
df['time'] = pd.to_datetime(df['time'])
df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 31363 entries, 0 to 31362 Data columns (total 3 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 time 31363 non-null datetime64[ns] 1 btc_etf 31363 non-null int64 2 eth_etf 31363 non-null int64 dtypes: datetime64[ns](1), int64(2) memory usage: 735.2 KB
# Monitoring the event day to ensure all the btc_etf data is '1'.
nov_15_rows = df[df['time'].dt.date == pd.to_datetime('2021-11-15').date()]
nov_15_rows
| time | btc_etf | eth_etf | |
|---|---|---|---|
| 7632 | 2021-11-15 00:00:00 | 1 | 0 |
| 7633 | 2021-11-15 01:00:00 | 1 | 0 |
| 7634 | 2021-11-15 02:00:00 | 1 | 0 |
| 7635 | 2021-11-15 03:00:00 | 1 | 0 |
| 7636 | 2021-11-15 04:00:00 | 1 | 0 |
| 7637 | 2021-11-15 05:00:00 | 1 | 0 |
| 7638 | 2021-11-15 06:00:00 | 1 | 0 |
| 7639 | 2021-11-15 07:00:00 | 1 | 0 |
| 7640 | 2021-11-15 08:00:00 | 1 | 0 |
| 7641 | 2021-11-15 09:00:00 | 1 | 0 |
| 7642 | 2021-11-15 10:00:00 | 1 | 0 |
| 7643 | 2021-11-15 11:00:00 | 1 | 0 |
| 7644 | 2021-11-15 12:00:00 | 1 | 0 |
| 7645 | 2021-11-15 13:00:00 | 1 | 0 |
| 7646 | 2021-11-15 14:00:00 | 1 | 0 |
| 7647 | 2021-11-15 15:00:00 | 1 | 0 |
| 7648 | 2021-11-15 16:00:00 | 1 | 0 |
| 7649 | 2021-11-15 17:00:00 | 1 | 0 |
| 7650 | 2021-11-15 18:00:00 | 1 | 0 |
| 7651 | 2021-11-15 19:00:00 | 1 | 0 |
| 7652 | 2021-11-15 20:00:00 | 1 | 0 |
| 7653 | 2021-11-15 21:00:00 | 1 | 0 |
| 7654 | 2021-11-15 22:00:00 | 1 | 0 |
| 7655 | 2021-11-15 23:00:00 | 1 | 0 |
# Net Volume Delta ('NVD')
btcusdt_spot['NVD'] = btcusdt_spot['buy_coin_volume'] - btcusdt_spot['sell_coin_volume']
btcusdt_spot['CVD'] = btcusdt_spot['NVD'].cumsum()
ethusdt_spot['NVD'] = ethusdt_spot['buy_coin_volume'] - ethusdt_spot['sell_coin_volume']
ethusdt_spot['CVD'] = ethusdt_spot['NVD'].cumsum()
btcusdt_futures['NVD'] = btcusdt_futures['buy_coin_volume'] - btcusdt_futures['sell_coin_volume']
btcusdt_futures['CVD'] = btcusdt_futures['NVD'].cumsum()
ethusdt_futures['NVD'] = ethusdt_futures['buy_coin_volume'] - ethusdt_futures['sell_coin_volume']
ethusdt_futures['CVD'] = ethusdt_futures['NVD'].cumsum()
# Renaming Columns for Clarification
btcusdt_spot.columns = ['spot_btc_' + col if col != 'time' else col for col in btcusdt_spot.columns]
ethusdt_spot.columns = ['spot_eth_' + col if col != 'time' else col for col in ethusdt_spot.columns]
btcusdt_futures.columns = ['futures_btc_' + col if col != 'time' else col for col in btcusdt_futures.columns]
ethusdt_futures.columns = ['futures_eth_' + col if col != 'time' else col for col in ethusdt_futures.columns]
spot_df = pd.merge(btcusdt_spot, ethusdt_spot, on='time')
spot_df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 31363 entries, 0 to 31362 Data columns (total 37 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 spot_btc_exchange 31363 non-null object 1 spot_btc_coin 31363 non-null object 2 spot_btc_product 31363 non-null object 3 time 31363 non-null object 4 spot_btc_open_price 31363 non-null float64 5 spot_btc_high_price 31363 non-null float64 6 spot_btc_low_price 31363 non-null float64 7 spot_btc_close_price 31363 non-null float64 8 spot_btc_coin_volume 31363 non-null float64 9 spot_btc_dollar_volume 31363 non-null float64 10 spot_btc_buy_trades 31363 non-null int64 11 spot_btc_sell_trades 31363 non-null int64 12 spot_btc_total_trades 31363 non-null int64 13 spot_btc_buy_coin_volume 31363 non-null float64 14 spot_btc_sell_coin_volume 31363 non-null float64 15 spot_btc_buy_dollar_volume 31363 non-null float64 16 spot_btc_sell_dollar_volume 31363 non-null float64 17 spot_btc_NVD 31363 non-null float64 18 spot_btc_CVD 31363 non-null float64 19 spot_eth_exchange 31363 non-null object 20 spot_eth_coin 31363 non-null object 21 spot_eth_product 31363 non-null object 22 spot_eth_open_price 31363 non-null float64 23 spot_eth_high_price 31363 non-null float64 24 spot_eth_low_price 31363 non-null float64 25 spot_eth_close_price 31363 non-null float64 26 spot_eth_coin_volume 31363 non-null float64 27 spot_eth_dollar_volume 31363 non-null float64 28 spot_eth_buy_trades 31363 non-null int64 29 spot_eth_sell_trades 31363 non-null int64 30 spot_eth_total_trades 31363 non-null int64 31 spot_eth_buy_coin_volume 31363 non-null float64 32 spot_eth_sell_coin_volume 31363 non-null float64 33 spot_eth_buy_dollar_volume 31363 non-null float64 34 spot_eth_sell_dollar_volume 31363 non-null float64 35 spot_eth_NVD 31363 non-null float64 36 spot_eth_CVD 31363 non-null float64 dtypes: float64(24), int64(6), object(7) memory usage: 8.9+ MB
# Unnecesarry Columns to Drop
columns_to_delete = ['spot_btc_exchange', 'spot_btc_coin', 'spot_btc_product',
'spot_eth_exchange', 'spot_eth_coin', 'spot_eth_product']
spot_df.drop(columns = columns_to_delete, inplace= True)
# Removing Bybit Data from Dataset Because only Binance Data wil be used
btcusdt_futures = btcusdt_futures[btcusdt_futures["futures_btc_exchange"] != 'bybit']
ethusdt_futures = ethusdt_futures[ethusdt_futures["futures_eth_exchange"] != 'bybit']
futures_df = pd.merge(btcusdt_futures, ethusdt_futures, on='time')
# Unnecesarry Columns to Drop
columns_to_delete_futures = ['futures_btc_exchange', 'futures_btc_coin', 'futures_btc_product',
'futures_eth_exchange', 'futures_eth_coin', 'futures_eth_product']
futures_df.drop(columns = columns_to_delete_futures, inplace= True)
futures_df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 31363 entries, 0 to 31362 Data columns (total 63 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 time 31363 non-null object 1 futures_btc_open_price 31363 non-null float64 2 futures_btc_high_price 31363 non-null float64 3 futures_btc_low_price 31363 non-null float64 4 futures_btc_close_price 31363 non-null float64 5 futures_btc_coin_volume 31363 non-null float64 6 futures_btc_dollar_volume 31363 non-null float64 7 futures_btc_buy_trades 31363 non-null int64 8 futures_btc_sell_trades 31363 non-null int64 9 futures_btc_total_trades 31363 non-null int64 10 futures_btc_buy_coin_volume 31363 non-null float64 11 futures_btc_sell_coin_volume 31363 non-null float64 12 futures_btc_buy_dollar_volume 31363 non-null float64 13 futures_btc_sell_dollar_volume 31363 non-null float64 14 futures_btc_coin_open_interest_high 31363 non-null float64 15 futures_btc_coin_open_interest_low 31363 non-null float64 16 futures_btc_coin_open_interest_close 31363 non-null float64 17 futures_btc_dollar_open_interest_high 31363 non-null float64 18 futures_btc_dollar_open_interest_low 31363 non-null float64 19 futures_btc_dollar_open_interest_close 31363 non-null float64 20 futures_btc_funding_rate 31363 non-null float64 21 futures_btc_premium 31360 non-null float64 22 futures_btc_buy_liquidations 31363 non-null int64 23 futures_btc_sell_liquidations 31363 non-null int64 24 futures_btc_buy_liquidations_coin_volume 31363 non-null float64 25 futures_btc_sell_liquidations_coin_volume 31363 non-null float64 26 futures_btc_liquidations_coin_volume 31363 non-null float64 27 futures_btc_buy_liquidations_dollar_volume 31363 non-null float64 28 futures_btc_sell_liquidations_dollar_volume 31363 non-null float64 29 futures_btc_liquidations_dollar_volume 31363 non-null float64 30 futures_btc_NVD 31363 non-null float64 31 futures_btc_CVD 31363 non-null float64 32 futures_eth_open_price 31363 non-null float64 33 futures_eth_high_price 31363 non-null float64 34 futures_eth_low_price 31363 non-null float64 35 futures_eth_close_price 31363 non-null float64 36 futures_eth_coin_volume 31363 non-null float64 37 futures_eth_dollar_volume 31363 non-null float64 38 futures_eth_buy_trades 31363 non-null int64 39 futures_eth_sell_trades 31363 non-null int64 40 futures_eth_total_trades 31363 non-null int64 41 futures_eth_buy_coin_volume 31363 non-null float64 42 futures_eth_sell_coin_volume 31363 non-null float64 43 futures_eth_buy_dollar_volume 31363 non-null float64 44 futures_eth_sell_dollar_volume 31363 non-null float64 45 futures_eth_coin_open_interest_high 31363 non-null float64 46 futures_eth_coin_open_interest_low 31363 non-null float64 47 futures_eth_coin_open_interest_close 31363 non-null float64 48 futures_eth_dollar_open_interest_high 31363 non-null float64 49 futures_eth_dollar_open_interest_low 31363 non-null float64 50 futures_eth_dollar_open_interest_close 31363 non-null float64 51 futures_eth_funding_rate 31363 non-null float64 52 futures_eth_premium 31360 non-null float64 53 futures_eth_buy_liquidations 31363 non-null int64 54 futures_eth_sell_liquidations 31363 non-null int64 55 futures_eth_buy_liquidations_coin_volume 31363 non-null float64 56 futures_eth_sell_liquidations_coin_volume 31363 non-null float64 57 futures_eth_liquidations_coin_volume 31363 non-null float64 58 futures_eth_buy_liquidations_dollar_volume 31363 non-null float64 59 futures_eth_sell_liquidations_dollar_volume 31363 non-null float64 60 futures_eth_liquidations_dollar_volume 31363 non-null float64 61 futures_eth_NVD 31363 non-null float64 62 futures_eth_CVD 31363 non-null float64 dtypes: float64(52), int64(10), object(1) memory usage: 15.1+ MB
# Merging Spot & Futures Data
merged_df = pd.merge(spot_df, futures_df, on='time')
merged_df
| time | spot_btc_open_price | spot_btc_high_price | spot_btc_low_price | spot_btc_close_price | spot_btc_coin_volume | spot_btc_dollar_volume | spot_btc_buy_trades | spot_btc_sell_trades | spot_btc_total_trades | ... | futures_eth_buy_liquidations | futures_eth_sell_liquidations | futures_eth_buy_liquidations_coin_volume | futures_eth_sell_liquidations_coin_volume | futures_eth_liquidations_coin_volume | futures_eth_buy_liquidations_dollar_volume | futures_eth_sell_liquidations_dollar_volume | futures_eth_liquidations_dollar_volume | futures_eth_NVD | futures_eth_CVD | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2021-01-01 00:00:00 | 28975.65 | 29031.34 | 28690.17 | 28995.13 | 2128.921567 | 6.146804e+07 | 27613 | 25162 | 52775 | ... | 5 | 33 | 21.972 | 279.905 | 301.877 | 1.615119e+04 | 204916.42667 | 2.210676e+05 | -14060.299 | -1.406030e+04 |
| 1 | 2021-01-01 01:00:00 | 28995.13 | 29470.00 | 28960.35 | 29409.99 | 5403.068471 | 1.583578e+08 | 59341 | 44555 | 103896 | ... | 42 | 0 | 1861.541 | 0.000 | 1861.541 | 1.383821e+06 | 0.00000 | 1.383821e+06 | 1493.404 | -1.381830e+04 |
| 2 | 2021-01-01 02:00:00 | 29409.99 | 29465.26 | 29120.03 | 29194.65 | 2384.231560 | 6.984265e+07 | 29051 | 28595 | 57646 | ... | 2 | 7 | 0.376 | 29.946 | 30.322 | 2.808902e+02 | 22303.70485 | 2.258460e+04 | -18465.699 | -3.247673e+04 |
| 3 | 2021-01-01 03:00:00 | 29194.65 | 29367.00 | 29150.02 | 29278.40 | 1461.345077 | 4.276078e+07 | 22782 | 19728 | 42510 | ... | 0 | 1 | 0.000 | 1.863 | 1.863 | 0.000000e+00 | 1387.58103 | 1.387581e+03 | 4070.918 | -2.839819e+04 |
| 4 | 2021-01-01 04:00:00 | 29278.40 | 29395.00 | 29029.40 | 29220.31 | 2038.046803 | 5.961464e+07 | 27193 | 28221 | 55414 | ... | 6 | 18 | 27.764 | 72.904 | 100.668 | 2.075376e+04 | 54193.04133 | 7.494680e+04 | -10863.373 | -3.965541e+04 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 31358 | 2024-07-30 14:00:00 | 66381.99 | 66449.99 | 65660.89 | 65810.00 | 2571.180050 | 1.699933e+08 | 60999 | 69820 | 130819 | ... | 1 | 69 | 0.029 | 251.286 | 251.315 | 9.647100e+01 | 831160.05300 | 8.312565e+05 | -4808.555 | -3.346359e+07 |
| 31359 | 2024-07-30 15:00:00 | 65810.00 | 66332.01 | 65555.00 | 66216.01 | 1403.788890 | 9.259820e+07 | 54286 | 48539 | 102825 | ... | 20 | 13 | 27.411 | 19.515 | 46.926 | 9.077219e+04 | 64262.31200 | 1.550345e+05 | 6774.840 | -3.346003e+07 |
| 31360 | 2024-07-30 16:00:00 | 66216.01 | 66550.01 | 66145.00 | 66180.01 | 629.106070 | 4.172559e+07 | 30733 | 25702 | 56435 | ... | 6 | 14 | 81.218 | 18.085 | 99.303 | 2.700290e+05 | 59828.56800 | 3.298576e+05 | -6013.903 | -3.345667e+07 |
| 31361 | 2024-07-30 17:00:00 | 66180.01 | 66210.01 | 65684.68 | 65868.01 | 1013.370530 | 6.679952e+07 | 42195 | 46843 | 89038 | ... | 8 | 77 | 12.896 | 268.487 | 281.383 | 4.259585e+04 | 884233.73000 | 9.268296e+05 | -11256.910 | -3.346888e+07 |
| 31362 | 2024-07-30 18:00:00 | 65868.01 | 66074.98 | 65600.00 | 65730.00 | 764.145240 | 5.029318e+07 | 36161 | 34833 | 70994 | ... | 3 | 30 | 2.361 | 67.421 | 69.782 | 7.780520e+03 | 221305.24500 | 2.290858e+05 | -13051.165 | -3.348799e+07 |
31363 rows × 93 columns
# Calculating futures to spot price ratio
merged_df['btc_futures_to_spot'] = merged_df['futures_btc_close_price'] / merged_df['spot_btc_close_price']
merged_df['eth_futures_to_spot'] = merged_df['futures_eth_close_price'] / merged_df['spot_eth_close_price']
# Changing 'time' column's data type to datetime
merged_df['time'] = pd.to_datetime(merged_df['time'])
merged_df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 31363 entries, 0 to 31362 Data columns (total 95 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 time 31363 non-null datetime64[ns] 1 spot_btc_open_price 31363 non-null float64 2 spot_btc_high_price 31363 non-null float64 3 spot_btc_low_price 31363 non-null float64 4 spot_btc_close_price 31363 non-null float64 5 spot_btc_coin_volume 31363 non-null float64 6 spot_btc_dollar_volume 31363 non-null float64 7 spot_btc_buy_trades 31363 non-null int64 8 spot_btc_sell_trades 31363 non-null int64 9 spot_btc_total_trades 31363 non-null int64 10 spot_btc_buy_coin_volume 31363 non-null float64 11 spot_btc_sell_coin_volume 31363 non-null float64 12 spot_btc_buy_dollar_volume 31363 non-null float64 13 spot_btc_sell_dollar_volume 31363 non-null float64 14 spot_btc_NVD 31363 non-null float64 15 spot_btc_CVD 31363 non-null float64 16 spot_eth_open_price 31363 non-null float64 17 spot_eth_high_price 31363 non-null float64 18 spot_eth_low_price 31363 non-null float64 19 spot_eth_close_price 31363 non-null float64 20 spot_eth_coin_volume 31363 non-null float64 21 spot_eth_dollar_volume 31363 non-null float64 22 spot_eth_buy_trades 31363 non-null int64 23 spot_eth_sell_trades 31363 non-null int64 24 spot_eth_total_trades 31363 non-null int64 25 spot_eth_buy_coin_volume 31363 non-null float64 26 spot_eth_sell_coin_volume 31363 non-null float64 27 spot_eth_buy_dollar_volume 31363 non-null float64 28 spot_eth_sell_dollar_volume 31363 non-null float64 29 spot_eth_NVD 31363 non-null float64 30 spot_eth_CVD 31363 non-null float64 31 futures_btc_open_price 31363 non-null float64 32 futures_btc_high_price 31363 non-null float64 33 futures_btc_low_price 31363 non-null float64 34 futures_btc_close_price 31363 non-null float64 35 futures_btc_coin_volume 31363 non-null float64 36 futures_btc_dollar_volume 31363 non-null float64 37 futures_btc_buy_trades 31363 non-null int64 38 futures_btc_sell_trades 31363 non-null int64 39 futures_btc_total_trades 31363 non-null int64 40 futures_btc_buy_coin_volume 31363 non-null float64 41 futures_btc_sell_coin_volume 31363 non-null float64 42 futures_btc_buy_dollar_volume 31363 non-null float64 43 futures_btc_sell_dollar_volume 31363 non-null float64 44 futures_btc_coin_open_interest_high 31363 non-null float64 45 futures_btc_coin_open_interest_low 31363 non-null float64 46 futures_btc_coin_open_interest_close 31363 non-null float64 47 futures_btc_dollar_open_interest_high 31363 non-null float64 48 futures_btc_dollar_open_interest_low 31363 non-null float64 49 futures_btc_dollar_open_interest_close 31363 non-null float64 50 futures_btc_funding_rate 31363 non-null float64 51 futures_btc_premium 31360 non-null float64 52 futures_btc_buy_liquidations 31363 non-null int64 53 futures_btc_sell_liquidations 31363 non-null int64 54 futures_btc_buy_liquidations_coin_volume 31363 non-null float64 55 futures_btc_sell_liquidations_coin_volume 31363 non-null float64 56 futures_btc_liquidations_coin_volume 31363 non-null float64 57 futures_btc_buy_liquidations_dollar_volume 31363 non-null float64 58 futures_btc_sell_liquidations_dollar_volume 31363 non-null float64 59 futures_btc_liquidations_dollar_volume 31363 non-null float64 60 futures_btc_NVD 31363 non-null float64 61 futures_btc_CVD 31363 non-null float64 62 futures_eth_open_price 31363 non-null float64 63 futures_eth_high_price 31363 non-null float64 64 futures_eth_low_price 31363 non-null float64 65 futures_eth_close_price 31363 non-null float64 66 futures_eth_coin_volume 31363 non-null float64 67 futures_eth_dollar_volume 31363 non-null float64 68 futures_eth_buy_trades 31363 non-null int64 69 futures_eth_sell_trades 31363 non-null int64 70 futures_eth_total_trades 31363 non-null int64 71 futures_eth_buy_coin_volume 31363 non-null float64 72 futures_eth_sell_coin_volume 31363 non-null float64 73 futures_eth_buy_dollar_volume 31363 non-null float64 74 futures_eth_sell_dollar_volume 31363 non-null float64 75 futures_eth_coin_open_interest_high 31363 non-null float64 76 futures_eth_coin_open_interest_low 31363 non-null float64 77 futures_eth_coin_open_interest_close 31363 non-null float64 78 futures_eth_dollar_open_interest_high 31363 non-null float64 79 futures_eth_dollar_open_interest_low 31363 non-null float64 80 futures_eth_dollar_open_interest_close 31363 non-null float64 81 futures_eth_funding_rate 31363 non-null float64 82 futures_eth_premium 31360 non-null float64 83 futures_eth_buy_liquidations 31363 non-null int64 84 futures_eth_sell_liquidations 31363 non-null int64 85 futures_eth_buy_liquidations_coin_volume 31363 non-null float64 86 futures_eth_sell_liquidations_coin_volume 31363 non-null float64 87 futures_eth_liquidations_coin_volume 31363 non-null float64 88 futures_eth_buy_liquidations_dollar_volume 31363 non-null float64 89 futures_eth_sell_liquidations_dollar_volume 31363 non-null float64 90 futures_eth_liquidations_dollar_volume 31363 non-null float64 91 futures_eth_NVD 31363 non-null float64 92 futures_eth_CVD 31363 non-null float64 93 btc_futures_to_spot 31363 non-null float64 94 eth_futures_to_spot 31363 non-null float64 dtypes: datetime64[ns](1), float64(78), int64(16) memory usage: 22.7 MB
# Creating an interactive graph for detection the liquidation cascades
fig = px.line(merged_df, x='time', y='futures_btc_close_price',
title='Price Comparison', width=1000, height=600)
fig.show()
# Based on the 'futures_btc_close_price' column, Liquidation cascade intervals are:
liquidation_periods = [
('2021-01-09 13:00:00', '2021-01-11 15:00:00'),
('2021-01-19 16:00:00', '2021-01-22 00:00:00'),
('2021-02-21 18:00:00', '2021-02-28 17:00:00'),
('2021-03-21 12:00:00', '2021-03-05 08:00:00'),
('2021-03-13 20:00:00', '2021-03-16 05:00:00'),
('2021-03-18 15:00:00', '2021-03-25 15:00:00'),
('2021-04-14 06:00:00', '2021-04-25 21:00:00'),
('2021-05-09 03:00:00', '2021-05-23 16:00:00'),
('2021-06-15 17:00:00', '2021-06-22 13:00:00'),
('2021-09-07 02:00:00', '2021-09-21 22:00:00'),
('2021-10-20 15:00:00', '2021-10-28 00:00:00'),
('2021-11-10 17:00:00', '2021-11-19 03:00:00'),
('2021-12-01 15:00:00', '2021-12-04 11:00:00'),
('2021-12-27 17:00:00', '2022-01-24 12:00:00'),
('2022-02-10 17:00:00', '2022-02-24 05:00:00'),
('2022-03-02 14:00:00', '2022-03-07 19:00:00'),
('2022-03-09 15:00:00', '2022-03-13 22:00:00'),
('2022-03-28 18:00:00', '2022-04-12 19:00:00'),
('2022-05-04 19:00:00', '2022-05-12 05:00:00'),
('2022-06-06 21:00:00', '2022-06-18 20:00:00'),
('2022-08-15 05:00:00', '2022-08-19 23:00:00'),
('2022-09-13 10:00:00', '2022-09-19 08:00:00'),
('2022-11-05 03:00:00', '2022-11-09 21:00:00'),
('2023-02-21 06:00:00', '2023-03-10 10:00:00'),
('2023-08-14 16:00:00', '2023-08-19 06:00:00'),
('2024-01-11 14:00:00', '2024-01-23 14:00:00'),
('2024-03-14 06:00:00', '2024-03-20 05:00:00'),
('2024-03-31 23:00:00', '2024-04-02 15:00:00'),
('2024-04-08 11:00:00', '2024-04-17 16:00:00'),
('2024-04-24 04:00:00', '2024-05-01 15:00:00'),
('2024-06-07 11:00:00', '2024-06-24 19:00:00'),
('2024-07-01 17:00:00', '2024-07-05 04:00:00'),
]
# Creating 'liquidation_cascades' column and set initial value to 0
merged_df['liquidation_cascades'] = 0
# Checking liquidation cascade periods and set 'liquidation_cascades' value to 1 for the relevant dates
for start_date, end_date in liquidation_periods:
mask = (merged_df['time'] >= start_date) & (merged_df['time'] <= end_date)
merged_df.loc[mask, 'liquidation_cascades'] = 1
# Checking the results
print(merged_df[['time', 'futures_btc_close_price', 'liquidation_cascades']].tail(30))
time futures_btc_close_price liquidation_cascades 31333 2024-07-29 13:00:00 69249.8 0 31334 2024-07-29 14:00:00 68200.1 0 31335 2024-07-29 15:00:00 68067.2 0 31336 2024-07-29 16:00:00 66921.0 0 31337 2024-07-29 17:00:00 66980.8 0 31338 2024-07-29 18:00:00 67389.7 0 31339 2024-07-29 19:00:00 67276.6 0 31340 2024-07-29 20:00:00 67348.4 0 31341 2024-07-29 21:00:00 67459.9 0 31342 2024-07-29 22:00:00 67195.3 0 31343 2024-07-29 23:00:00 66750.0 0 31344 2024-07-30 00:00:00 66572.0 0 31345 2024-07-30 01:00:00 66176.5 0 31346 2024-07-30 02:00:00 66396.6 0 31347 2024-07-30 03:00:00 66585.6 0 31348 2024-07-30 04:00:00 66778.2 0 31349 2024-07-30 05:00:00 66455.9 0 31350 2024-07-30 06:00:00 66733.2 0 31351 2024-07-30 07:00:00 66914.0 0 31352 2024-07-30 08:00:00 66740.0 0 31353 2024-07-30 09:00:00 66580.0 0 31354 2024-07-30 10:00:00 66545.7 0 31355 2024-07-30 11:00:00 66532.5 0 31356 2024-07-30 12:00:00 66635.4 0 31357 2024-07-30 13:00:00 66357.3 0 31358 2024-07-30 14:00:00 65779.4 0 31359 2024-07-30 15:00:00 66198.0 0 31360 2024-07-30 16:00:00 66150.2 0 31361 2024-07-30 17:00:00 65839.8 0 31362 2024-07-30 18:00:00 65619.9 0
# Traces for the line graph
line_trace = go.Scatter(
x=merged_df['time'],
y=merged_df['futures_btc_close_price'],
mode='lines',
name='futures_btc_close_price',
line=dict(color='blue')
)
# Traces for the liquidation cascade points
liquidation_trace = go.Scatter(
x=merged_df[merged_df['liquidation_cascades'] == 1]['time'],
y=merged_df[merged_df['liquidation_cascades'] == 1]['futures_btc_close_price'],
mode='markers',
name='Liquidation Cascades',
marker=dict(color='red', size=6)
)
# Creating the figure and add the traces
fig = go.Figure()
fig.add_trace(line_trace)
fig.add_trace(liquidation_trace)
fig.update_layout(
title='Price & Liquidation Cascades',
xaxis_title='time',
yaxis_title='futures_btc_close_price',
width=1200,
height=800
)
fig.show()
#Finalising the dataset with adding ETF dates
merged_df = pd.merge(merged_df, df, on='time')
The dataset is filtered for the data to be focused on in the project.
Columns = ['time', 'spot_btc_coin_volume', 'spot_btc_dollar_volume', 'spot_btc_total_trades', 'spot_btc_CVD',
'futures_btc_close_price', 'futures_btc_coin_volume', 'futures_btc_dollar_volume', 'futures_btc_total_trades',
'futures_btc_coin_open_interest_close', 'futures_btc_funding_rate', 'futures_btc_liquidations_coin_volume',
'futures_btc_CVD', 'spot_eth_coin_volume', 'spot_eth_dollar_volume', 'spot_eth_total_trades', 'spot_eth_CVD',
'futures_eth_close_price', 'futures_eth_coin_volume', 'futures_eth_dollar_volume', 'futures_eth_total_trades',
'futures_eth_coin_open_interest_close', 'futures_eth_funding_rate', 'futures_eth_liquidations_coin_volume',
'futures_eth_CVD', 'eth_etf', 'btc_etf', 'liquidation_cascades', 'btc_futures_to_spot', 'eth_futures_to_spot']
prediction_df = merged_df[Columns]
prediction_df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 31363 entries, 0 to 31362 Data columns (total 30 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 time 31363 non-null datetime64[ns] 1 spot_btc_coin_volume 31363 non-null float64 2 spot_btc_dollar_volume 31363 non-null float64 3 spot_btc_total_trades 31363 non-null int64 4 spot_btc_CVD 31363 non-null float64 5 futures_btc_close_price 31363 non-null float64 6 futures_btc_coin_volume 31363 non-null float64 7 futures_btc_dollar_volume 31363 non-null float64 8 futures_btc_total_trades 31363 non-null int64 9 futures_btc_coin_open_interest_close 31363 non-null float64 10 futures_btc_funding_rate 31363 non-null float64 11 futures_btc_liquidations_coin_volume 31363 non-null float64 12 futures_btc_CVD 31363 non-null float64 13 spot_eth_coin_volume 31363 non-null float64 14 spot_eth_dollar_volume 31363 non-null float64 15 spot_eth_total_trades 31363 non-null int64 16 spot_eth_CVD 31363 non-null float64 17 futures_eth_close_price 31363 non-null float64 18 futures_eth_coin_volume 31363 non-null float64 19 futures_eth_dollar_volume 31363 non-null float64 20 futures_eth_total_trades 31363 non-null int64 21 futures_eth_coin_open_interest_close 31363 non-null float64 22 futures_eth_funding_rate 31363 non-null float64 23 futures_eth_liquidations_coin_volume 31363 non-null float64 24 futures_eth_CVD 31363 non-null float64 25 eth_etf 31363 non-null int64 26 btc_etf 31363 non-null int64 27 liquidation_cascades 31363 non-null int64 28 btc_futures_to_spot 31363 non-null float64 29 eth_futures_to_spot 31363 non-null float64 dtypes: datetime64[ns](1), float64(22), int64(7) memory usage: 7.2 MB
# Setting the 'time' column as the index of the prediction_df DataFrame,
# enabling time-based indexing and facilitating time series data manipulation.
prediction_df = prediction_df.set_index('time')
# Creation Log-returns and Volatility feature
# Calculating hourly returns
prediction_df['log_returns'] = np.log(prediction_df['futures_btc_close_price'] / prediction_df['futures_btc_close_price'].shift(1))
# Calculating volatility using rolling standard deviation (24-hour window)
prediction_df['volatility'] = prediction_df['log_returns'].rolling(window=24).std()
# For log_returns target
# Correlation matrix and target correlation calculation
cor_btc = prediction_df
correlation_matrix = cor_btc.corr()
target_correlation = correlation_matrix['log_returns'].sort_values(ascending=False)
# Converting to DataFrame for better display
target_correlation_df = target_correlation.reset_index()
target_correlation_df.columns = ['Feature', 'Correlation with Log Returns']
# Displaying
target_correlation_df
| Feature | Correlation with Log Returns | |
|---|---|---|
| 0 | log_returns | 1.000000 |
| 1 | eth_futures_to_spot | 0.026956 |
| 2 | volatility | 0.011714 |
| 3 | futures_btc_close_price | 0.010303 |
| 4 | futures_eth_CVD | 0.008693 |
| 5 | futures_btc_CVD | 0.007548 |
| 6 | futures_eth_coin_open_interest_close | 0.003631 |
| 7 | futures_eth_close_price | 0.002391 |
| 8 | spot_eth_CVD | -0.000906 |
| 9 | eth_etf | -0.002324 |
| 10 | futures_btc_coin_open_interest_close | -0.002597 |
| 11 | spot_btc_CVD | -0.002969 |
| 12 | futures_eth_funding_rate | -0.004351 |
| 13 | btc_futures_to_spot | -0.004866 |
| 14 | btc_etf | -0.005193 |
| 15 | spot_btc_coin_volume | -0.014327 |
| 16 | spot_btc_total_trades | -0.016394 |
| 17 | futures_btc_funding_rate | -0.019416 |
| 18 | spot_btc_dollar_volume | -0.026889 |
| 19 | futures_btc_coin_volume | -0.041359 |
| 20 | futures_btc_total_trades | -0.051736 |
| 21 | futures_btc_dollar_volume | -0.052388 |
| 22 | futures_eth_coin_volume | -0.066181 |
| 23 | spot_eth_total_trades | -0.067869 |
| 24 | spot_eth_coin_volume | -0.070614 |
| 25 | futures_btc_liquidations_coin_volume | -0.076852 |
| 26 | futures_eth_total_trades | -0.078598 |
| 27 | spot_eth_dollar_volume | -0.082632 |
| 28 | futures_eth_dollar_volume | -0.086016 |
| 29 | liquidation_cascades | -0.086660 |
| 30 | futures_eth_liquidations_coin_volume | -0.109899 |
Since the volatility variable is created for a 24-hour window, the first 24-hour data in the same column will come as a missing value. Also, since the log returns are created with the shift(1) method, the first row of this variable will come as a missing value. Since the dataset is large enough, deleting these missing values will not cause any problems for the next steps.
prediction_df.info()
<class 'pandas.core.frame.DataFrame'> DatetimeIndex: 31363 entries, 2021-01-01 00:00:00 to 2024-07-30 18:00:00 Data columns (total 31 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 spot_btc_coin_volume 31363 non-null float64 1 spot_btc_dollar_volume 31363 non-null float64 2 spot_btc_total_trades 31363 non-null int64 3 spot_btc_CVD 31363 non-null float64 4 futures_btc_close_price 31363 non-null float64 5 futures_btc_coin_volume 31363 non-null float64 6 futures_btc_dollar_volume 31363 non-null float64 7 futures_btc_total_trades 31363 non-null int64 8 futures_btc_coin_open_interest_close 31363 non-null float64 9 futures_btc_funding_rate 31363 non-null float64 10 futures_btc_liquidations_coin_volume 31363 non-null float64 11 futures_btc_CVD 31363 non-null float64 12 spot_eth_coin_volume 31363 non-null float64 13 spot_eth_dollar_volume 31363 non-null float64 14 spot_eth_total_trades 31363 non-null int64 15 spot_eth_CVD 31363 non-null float64 16 futures_eth_close_price 31363 non-null float64 17 futures_eth_coin_volume 31363 non-null float64 18 futures_eth_dollar_volume 31363 non-null float64 19 futures_eth_total_trades 31363 non-null int64 20 futures_eth_coin_open_interest_close 31363 non-null float64 21 futures_eth_funding_rate 31363 non-null float64 22 futures_eth_liquidations_coin_volume 31363 non-null float64 23 futures_eth_CVD 31363 non-null float64 24 eth_etf 31363 non-null int64 25 btc_etf 31363 non-null int64 26 liquidation_cascades 31363 non-null int64 27 btc_futures_to_spot 31363 non-null float64 28 eth_futures_to_spot 31363 non-null float64 29 log_returns 31362 non-null float64 30 volatility 31339 non-null float64 dtypes: float64(24), int64(7) memory usage: 7.7 MB
prediction_df = prediction_df.dropna()
# Saving prediction_df
prediction_df.to_csv('prediction_df.csv', index=False)
# For log_returns target
X = prediction_df.drop(columns=['log_returns'])
y = prediction_df['log_returns']
model = XGBRegressor(n_estimators=100, random_state=42)
model.fit(X, y)
plt.figure(figsize=(18, 16))
plot_importance(model)
plt.title('Feature Importance for Log Returns')
plt.show()
<Figure size 1800x1600 with 0 Axes>
perm_importance = permutation_importance(model, X, y, n_repeats=10, random_state=42)
perm_importance_df = pd.DataFrame({'Feature': X.columns, 'Importance': perm_importance.importances_mean})
perm_importance_df = perm_importance_df.sort_values(by='Importance', ascending=False)
perm_importance_df
| Feature | Importance | |
|---|---|---|
| 27 | btc_futures_to_spot | 0.271289 |
| 4 | futures_btc_close_price | 0.233474 |
| 22 | futures_eth_liquidations_coin_volume | 0.211302 |
| 6 | futures_btc_dollar_volume | 0.191674 |
| 28 | eth_futures_to_spot | 0.180187 |
| 10 | futures_btc_liquidations_coin_volume | 0.137095 |
| 12 | spot_eth_coin_volume | 0.117724 |
| 18 | futures_eth_dollar_volume | 0.116423 |
| 16 | futures_eth_close_price | 0.111468 |
| 13 | spot_eth_dollar_volume | 0.108617 |
| 20 | futures_eth_coin_open_interest_close | 0.107966 |
| 9 | futures_btc_funding_rate | 0.107029 |
| 5 | futures_btc_coin_volume | 0.092551 |
| 26 | liquidation_cascades | 0.081697 |
| 7 | futures_btc_total_trades | 0.078206 |
| 29 | volatility | 0.077506 |
| 21 | futures_eth_funding_rate | 0.070549 |
| 19 | futures_eth_total_trades | 0.068990 |
| 15 | spot_eth_CVD | 0.068131 |
| 8 | futures_btc_coin_open_interest_close | 0.065895 |
| 0 | spot_btc_coin_volume | 0.064768 |
| 2 | spot_btc_total_trades | 0.062339 |
| 1 | spot_btc_dollar_volume | 0.059547 |
| 17 | futures_eth_coin_volume | 0.058410 |
| 3 | spot_btc_CVD | 0.056864 |
| 23 | futures_eth_CVD | 0.056547 |
| 14 | spot_eth_total_trades | 0.051055 |
| 11 | futures_btc_CVD | 0.050401 |
| 25 | btc_etf | 0.000063 |
| 24 | eth_etf | 0.000051 |
mi = mutual_info_regression(X, y)
mi_importance = pd.DataFrame({'Feature': X.columns, 'Importance Log Returns': mi})
mi_importance = mi_importance.sort_values(by='Importance Log Returns', ascending=False)
print(mi_importance)
Feature Importance Log Returns 6 futures_btc_dollar_volume 0.285483 10 futures_btc_liquidations_coin_volume 0.285337 7 futures_btc_total_trades 0.255280 13 spot_eth_dollar_volume 0.226058 14 spot_eth_total_trades 0.216307 12 spot_eth_coin_volume 0.215852 5 futures_btc_coin_volume 0.206107 22 futures_eth_liquidations_coin_volume 0.190577 19 futures_eth_total_trades 0.171471 18 futures_eth_dollar_volume 0.167824 0 spot_btc_coin_volume 0.166778 1 spot_btc_dollar_volume 0.166540 2 spot_btc_total_trades 0.149137 29 volatility 0.139134 17 futures_eth_coin_volume 0.135602 3 spot_btc_CVD 0.085274 11 futures_btc_CVD 0.066029 23 futures_eth_CVD 0.062902 8 futures_btc_coin_open_interest_close 0.062701 15 spot_eth_CVD 0.060193 4 futures_btc_close_price 0.046716 20 futures_eth_coin_open_interest_close 0.043631 27 btc_futures_to_spot 0.037968 16 futures_eth_close_price 0.026432 9 futures_btc_funding_rate 0.023495 28 eth_futures_to_spot 0.021930 21 futures_eth_funding_rate 0.016357 26 liquidation_cascades 0.008262 24 eth_etf 0.000138 25 btc_etf 0.000000
cor_btc = prediction_df
correlation_matrix = cor_btc.corr()
target_correlation = correlation_matrix['futures_btc_close_price'].sort_values(ascending=False)
target_correlation_df = target_correlation.reset_index()
target_correlation_df.columns = ['Feature', 'Correlation with BTC Futures Price']
b
target_correlation_df
| Feature | Correlation with BTC Futures Price | |
|---|---|---|
| 0 | futures_btc_close_price | 1.000000 |
| 1 | futures_eth_close_price | 0.822552 |
| 2 | btc_futures_to_spot | 0.386361 |
| 3 | futures_btc_funding_rate | 0.320078 |
| 4 | futures_eth_funding_rate | 0.286436 |
| 5 | eth_futures_to_spot | 0.282220 |
| 6 | liquidation_cascades | 0.221141 |
| 7 | spot_eth_dollar_volume | 0.186435 |
| 8 | spot_eth_total_trades | 0.180385 |
| 9 | futures_btc_dollar_volume | 0.144424 |
| 10 | volatility | 0.087102 |
| 11 | spot_btc_CVD | 0.056154 |
| 12 | futures_eth_dollar_volume | 0.039171 |
| 13 | futures_btc_liquidations_coin_volume | 0.037000 |
| 14 | btc_etf | 0.024495 |
| 15 | futures_eth_liquidations_coin_volume | 0.016909 |
| 16 | log_returns | 0.010335 |
| 17 | eth_etf | 0.006500 |
| 18 | futures_eth_total_trades | -0.022737 |
| 19 | futures_btc_total_trades | -0.041708 |
| 20 | spot_eth_coin_volume | -0.085156 |
| 21 | spot_eth_CVD | -0.097582 |
| 22 | spot_btc_dollar_volume | -0.140402 |
| 23 | futures_btc_coin_volume | -0.224072 |
| 24 | futures_eth_coin_volume | -0.269293 |
| 25 | spot_btc_total_trades | -0.354207 |
| 26 | futures_eth_coin_open_interest_close | -0.398240 |
| 27 | spot_btc_coin_volume | -0.414169 |
| 28 | futures_eth_CVD | -0.426634 |
| 29 | futures_btc_CVD | -0.434872 |
| 30 | futures_btc_coin_open_interest_close | -0.621279 |
from xgboost import XGBRegressor
from xgboost import plot_importance
import matplotlib.pyplot as plt
# Define the input features and target variable
X = prediction_df.drop(columns=['futures_btc_close_price'])
y = prediction_df['futures_btc_close_price'] # Target variable
model = XGBRegressor(n_estimators=100, random_state=42)
model.fit(X, y)
plt.figure(figsize=(18, 16))
plot_importance(model)
plt.title('Feature Importance for BTC Futures Price')
plt.show()
<Figure size 1800x1600 with 0 Axes>
perm_importance = permutation_importance(model, X, y, n_repeats=10, random_state=42)
perm_importance_df = pd.DataFrame({'Feature': X.columns, 'Importance': perm_importance.importances_mean})
perm_importance_df = perm_importance_df.sort_values(by='Importance', ascending=False)
perm_importance_df
| Feature | Importance | |
|---|---|---|
| 15 | futures_eth_close_price | 0.719054 |
| 22 | futures_eth_CVD | 0.226792 |
| 3 | spot_btc_CVD | 0.114155 |
| 7 | futures_btc_coin_open_interest_close | 0.092609 |
| 10 | futures_btc_CVD | 0.025362 |
| 14 | spot_eth_CVD | 0.013361 |
| 26 | btc_futures_to_spot | 0.006731 |
| 19 | futures_eth_coin_open_interest_close | 0.003811 |
| 27 | eth_futures_to_spot | 0.001574 |
| 8 | futures_btc_funding_rate | 0.000750 |
| 20 | futures_eth_funding_rate | 0.000439 |
| 29 | volatility | 0.000378 |
| 2 | spot_btc_total_trades | 0.000259 |
| 5 | futures_btc_dollar_volume | 0.000175 |
| 25 | liquidation_cascades | 0.000140 |
| 1 | spot_btc_dollar_volume | 0.000111 |
| 18 | futures_eth_total_trades | 0.000109 |
| 16 | futures_eth_coin_volume | 0.000084 |
| 28 | log_returns | 0.000078 |
| 0 | spot_btc_coin_volume | 0.000067 |
| 4 | futures_btc_coin_volume | 0.000046 |
| 21 | futures_eth_liquidations_coin_volume | 0.000030 |
| 9 | futures_btc_liquidations_coin_volume | 0.000029 |
| 13 | spot_eth_total_trades | 0.000026 |
| 11 | spot_eth_coin_volume | 0.000024 |
| 17 | futures_eth_dollar_volume | 0.000020 |
| 6 | futures_btc_total_trades | 0.000017 |
| 12 | spot_eth_dollar_volume | 0.000009 |
| 23 | eth_etf | 0.000006 |
| 24 | btc_etf | 0.000003 |
mi = mutual_info_regression(X, y)
mi_importance = pd.DataFrame({'Feature': X.columns, 'Importance BTC Futures Price': mi})
mi_importance = mi_importance.sort_values(by='Importance BTC Futures Price', ascending=False)
print(mi_importance)
Feature Importance BTC Futures Price 3 spot_btc_CVD 3.118767 22 futures_eth_CVD 2.793937 10 futures_btc_CVD 2.665262 14 spot_eth_CVD 2.497796 15 futures_eth_close_price 2.212861 7 futures_btc_coin_open_interest_close 1.552230 19 futures_eth_coin_open_interest_close 1.516246 29 volatility 0.805033 2 spot_btc_total_trades 0.463585 20 futures_eth_funding_rate 0.450078 26 btc_futures_to_spot 0.442779 8 futures_btc_funding_rate 0.435737 27 eth_futures_to_spot 0.411590 0 spot_btc_coin_volume 0.409441 1 spot_btc_dollar_volume 0.295178 12 spot_eth_dollar_volume 0.222542 13 spot_eth_total_trades 0.212824 11 spot_eth_coin_volume 0.178857 16 futures_eth_coin_volume 0.177275 5 futures_btc_dollar_volume 0.148076 18 futures_eth_total_trades 0.140082 4 futures_btc_coin_volume 0.139650 9 futures_btc_liquidations_coin_volume 0.127571 17 futures_eth_dollar_volume 0.126356 6 futures_btc_total_trades 0.122911 21 futures_eth_liquidations_coin_volume 0.096202 25 liquidation_cascades 0.082224 28 log_returns 0.046752 24 btc_etf 0.012384 23 eth_etf 0.009579
In order to more easily observe the independent variables and the target variable on the same graph, it is necessary to scale all variables with the minmax scaler.
# Preserve the original time series
time_index = prediction_df.index
# Create the MinMaxScaler
scaler = MinMaxScaler()
# Apply MinMax scaling to prediction_df (excluding the index)
scaled_data = scaler.fit_transform(prediction_df)
# Store the scaled data as a DataFrame
minmax_df = pd.DataFrame(scaled_data, columns=prediction_df.columns, index=time_index)
# The time series has been re-added as the index
minmax_df.index.name = 'time'
import matplotlib.pyplot as plt
# A figure is created with a specified size
plt.figure(figsize=(14, 8))
# The scaled BTC futures close price
plt.plot(minmax_df.index, minmax_df['futures_btc_close_price'], color='blue', linewidth=1, label='futures_btc_close_price')
# The scaled ETH futures close price
plt.plot(minmax_df.index, minmax_df['futures_eth_close_price'], color='green', linewidth=1, alpha=0.5, label='futures_eth_close_price')
# Points where liquidation cascades occur are highlighted by scattering red dots on the BTC futures close price
plt.scatter(minmax_df[minmax_df['liquidation_cascades'] == 1].index,
minmax_df[minmax_df['liquidation_cascades'] == 1]['futures_btc_close_price'],
color='red', s=30, label='Liquidation Cascades') # The 's' value is increased for larger dots
# A title is added to the plot
plt.title('BTC Price and ETH Price with Liquidation Cascades')
plt.xlabel('Time')
plt.ylabel('Scaled Price')
# A legend is added below the plot, centered.
plt.legend(loc='upper center', bbox_to_anchor=(0.5, -0.1), fancybox=True, shadow=True, ncol=3)
plt.show()
There is a strong correlation observed between the futures ETH and BTC prices; these two assets generally move in the same direction and with similar magnitude, indicating that common market dynamics are influencing both. Liquidation cascades coincide with significant drops in both ETH and BTC prices, highlighting that liquidations are a key factor increasing market volatility. This suggests that liquidations not only impact BTC but also affect ETH, demonstrating that the price movements of these two assets are closely interconnected.
plt.figure(figsize=(14, 8))
plt.plot(minmax_df.index, minmax_df['futures_btc_close_price'], color='blue', linewidth=1, label='futures_btc_close_price')
plt.plot(minmax_df.index, minmax_df['spot_btc_CVD'], color='green', linewidth=1, alpha=0.5, label='spot_btc_CVD')
plt.scatter(minmax_df[minmax_df['liquidation_cascades'] == 1].index,
minmax_df[minmax_df['liquidation_cascades'] == 1]['futures_btc_close_price'],
color='red', s=30, label='Liquidation Cascades') # Increased 's' value for larger dots
plt.title('BTC Price and spot_btc_CVD with Liquidation Cascades')
plt.xlabel('Time')
plt.ylabel('Scaled Price')
plt.legend(loc='upper center', bbox_to_anchor=(0.5, -0.1), fancybox=True, shadow=True, ncol=3)
plt.show()
The graph shows that spot_btc_CVD (green line) exhibits a general downward trend over time, indicating a decline in the cumulative volume balance in the spot BTC market and an increase in selling pressure. While there isn’t a direct correlation between futures_btc_close_price (blue line) and spot_btc_CVD, periods of decreasing spot_btc_CVD coincide with increased price fluctuations and a higher frequency of liquidation cascades (red dots). This suggests that declines in volume balance may lead to increased volatility in BTC prices, with liquidations further exacerbating this volatility. Overall, the drop in spot_btc_CVD appears to be associated with sharp declines in BTC prices and liquidations, implying that selling pressure in the market, combined with liquidations, puts additional pressure on prices.
plt.figure(figsize=(14, 8))
plt.plot(minmax_df.index, minmax_df['futures_btc_close_price'], color='blue', linewidth=1, label='futures_btc_close_price')
plt.plot(minmax_df.index, minmax_df['futures_btc_CVD'], color='green', linewidth=1, alpha=0.5, label='futures_btc_CVD')
plt.scatter(minmax_df[minmax_df['liquidation_cascades'] == 1].index,
minmax_df[minmax_df['liquidation_cascades'] == 1]['futures_btc_close_price'],
color='red', s=30, label='Liquidation Cascades') # Increased 's' value for larger dots
plt.title('BTC Price and futures_btc_CVD with Liquidation Cascades')
plt.xlabel('Time')
plt.ylabel('Scaled Price')
plt.legend(loc='upper center', bbox_to_anchor=(0.5, -0.1), fancybox=True, shadow=True, ncol=3)
plt.show()
In this graph, futures_btc_CVD (green line) generally shows a downward trend over time, indicating a decrease in cumulative volume balance in the BTC futures market and possibly an increase in selling pressure. The negative relationship between BTC closing price (futures_btc_close_price, blue line) and futures_btc_CVD is notable; as futures_btc_CVD decreases, there are more fluctuations in BTC prices and an increase in liquidation cascades (red dots). This suggests that a decrease in futures_btc_CVD contributes to greater volatility in BTC prices, with liquidation cascades further exacerbating this volatility. Overall, the combination of selling pressure and declining volume balance negatively impacts BTC prices, deepening price drops when combined with liquidations.
plt.figure(figsize=(14, 8))
plt.plot(minmax_df.index, minmax_df['futures_btc_close_price'], color='blue', linewidth=1, label='futures_btc_close_price')
plt.plot(minmax_df.index, minmax_df['futures_btc_coin_open_interest_close'], color='green', linewidth=1, alpha=0.5, label='futures_btc_coin_open_interest_close')
plt.scatter(minmax_df[minmax_df['liquidation_cascades'] == 1].index,
minmax_df[minmax_df['liquidation_cascades'] == 1]['futures_btc_close_price'],
color='red', s=30, label='Liquidation Cascades') # Increased 's' value for larger dots
plt.title('BTC Price and futures_btc_coin_open_interest_close with Liquidation Cascades')
plt.xlabel('Time')
plt.ylabel('Scaled Price')
plt.legend(loc='upper center', bbox_to_anchor=(0.5, -0.1), fancybox=True, shadow=True, ncol=3)
plt.show()
This graph shows the relationship between futures_btc_close_price (blue line) and futures_btc_coin_open_interest_close (green line), along with liquidation cascades (red dots). The graph reveals a generally inverse relationship between the BTC closing price and the open interest close; as futures_btc_coin_open_interest_close rises, BTC prices often decline, and vice versa. Notably, periods of increased open interest often coincide with higher market volatility and liquidation events, suggesting that as the open interest grows, the market becomes more susceptible to sharp price movements and liquidations. This dynamic indicates that open interest plays a critical role in the overall market behavior, particularly in driving or exacerbating price declines during periods of market stress.
plt.figure(figsize=(14, 8))
plt.plot(minmax_df.index, minmax_df['futures_btc_close_price'], color='blue', linewidth=1, label='futures_btc_close_price')
plt.plot(minmax_df.index, minmax_df['spot_eth_CVD'], color='green', linewidth=1, alpha=0.5, label='spot_eth_CVD')
plt.scatter(minmax_df[minmax_df['liquidation_cascades'] == 1].index,
minmax_df[minmax_df['liquidation_cascades'] == 1]['futures_btc_close_price'],
color='red', s=30, label='Liquidation Cascades') # Increased 's' value for larger dots
plt.title('BTC Price and spot_eth_CVD with Liquidation Cascades')
plt.xlabel('Time')
plt.ylabel('Scaled Price')
plt.legend(loc='upper center', bbox_to_anchor=(0.5, -0.1), fancybox=True, shadow=True, ncol=3)
plt.show()
The graph indicates a generally inverse relationship between Spot ETH CVD and BTC closing price; as spot_eth_CVD decreases, BTC prices tend to experience more volatility and sharp drops, often coinciding with liquidation events. This suggests that as the cumulative volume delta for ETH decreases, it may signal increased selling pressure or reduced buying strength, which in turn negatively impacts BTC prices, especially during periods of market stress reflected in the liquidation cascades.
Correlation: There is a negative correlation (-0.397646).
XGBoost: It has been identified as the fifth most important feature (F score: 403.0).
Permutation Feature Importance: It carries moderate importance (0.007885).
Mutual Information: It ranks seventh (1.514458).
Lasso Regression: Despite having a small positive coefficient (1.669578e-02), it remains an important feature in the model.
plt.figure(figsize=(14, 8))
plt.plot(minmax_df.index, minmax_df['futures_btc_close_price'], color='blue', linewidth=1, label='futures_btc_close_price')
plt.plot(minmax_df.index, minmax_df['futures_eth_coin_open_interest_close'], color='green', linewidth=1, alpha=0.5, label='futures_eth_coin_open_interest_close')
plt.scatter(minmax_df[minmax_df['liquidation_cascades'] == 1].index,
minmax_df[minmax_df['liquidation_cascades'] == 1]['futures_btc_close_price'],
color='red', s=30, label='Liquidation Cascades') # Increased 's' value for larger dots
plt.title('BTC Price and futures_eth_coin_open_interest_close with Liquidation Cascades')
plt.xlabel('Time')
plt.ylabel('Scaled Price')
plt.legend(loc='upper center', bbox_to_anchor=(0.5, -0.1), fancybox=True, shadow=True, ncol=3)
plt.show()
The data suggests an inverse relationship between BTC prices and ETH open interest; as the futures_eth_coin_open_interest_close rises, BTC prices tend to decline. Additionally, periods of high ETH open interest often coincide with increased market volatility and liquidation events, indicating that as open interest grows, the market may become more susceptible to price drops and liquidations, especially during stressed market conditions.
Correlation: There is a negative correlation (-0.140156).
XGBoost: It has a lower importance score (F score: 146.0).
Permutation Feature Importance: It has low importance (0.000120).
Mutual Information: It ranks second (0.294437).
plt.figure(figsize=(14, 8))
plt.plot(minmax_df.index, minmax_df['futures_btc_close_price'], color='blue', linewidth=1, label='futures_btc_close_price')
plt.plot(minmax_df.index, minmax_df['spot_btc_dollar_volume'], color='green', linewidth=1, alpha=0.5, label='spot_btc_dollar_volume')
plt.scatter(minmax_df[minmax_df['liquidation_cascades'] == 1].index,
minmax_df[minmax_df['liquidation_cascades'] == 1]['futures_btc_close_price'],
color='red', s=30, label='Liquidation Cascades') # Increased 's' value for larger dots
plt.title('BTC Price and spot_btc_dollar_volume with Liquidation Cascades')
plt.xlabel('Time')
plt.ylabel('Scaled Price')
plt.legend(loc='upper center', bbox_to_anchor=(0.5, -0.1), fancybox=True, shadow=True, ncol=3)
plt.show()
The green bars representing Spot BTC Dollar Volume show significant fluctuations, often aligning with periods of high volatility in BTC prices. Notably, spikes in dollar volume frequently precede or coincide with liquidation events, indicating that increased trading activity may contribute to market stress and subsequent price drops.
plt.figure(figsize=(14, 8))
plt.plot(minmax_df.index, minmax_df['futures_btc_close_price'], color='blue', linewidth=1, label='futures_btc_close_price')
plt.plot(minmax_df.index, minmax_df['futures_btc_dollar_volume'], color='green', linewidth=1, alpha=0.5, label='futures_btc_dollar_volume')
plt.scatter(minmax_df[minmax_df['liquidation_cascades'] == 1].index,
minmax_df[minmax_df['liquidation_cascades'] == 1]['futures_btc_close_price'],
color='red', s=30, label='Liquidation Cascades') # Increased 's' value for larger dots
plt.title('BTC Price and futures_btc_dollar_volume with Liquidation Cascades')
plt.xlabel('Time')
plt.ylabel('Scaled Price')
plt.legend(loc='upper center', bbox_to_anchor=(0.5, -0.1), fancybox=True, shadow=True, ncol=3)
plt.show()
The graph illustrates the relationship between the scaled futures BTC close price (in blue) and the futures BTC dollar volume (in green) over time, with specific attention to the occurrence of liquidation cascades (marked by red dots). The blue line representing the BTC close price shows significant fluctuations, capturing the inherent volatility in the cryptocurrency market. The green bars, indicating dollar volume, fluctuate in response to market conditions, with notable spikes corresponding to periods of high trading activity. The red dots highlight points of liquidation cascades, which seem to coincide with sharp declines or significant movements in the BTC close price. This suggests that liquidation cascades are likely triggered during periods of increased volatility and substantial trading volume, emphasizing the close relationship between price movements, trading volume, and the occurrence of liquidations in the market.
plt.figure(figsize=(14, 8))
plt.plot(minmax_df.index, minmax_df['futures_btc_close_price'], color='blue', linewidth=1, label='futures_btc_close_price')
plt.plot(minmax_df.index, minmax_df['spot_eth_dollar_volume'], color='green', linewidth=1, alpha=0.5, label='spot_eth_dollar_volume')
plt.scatter(minmax_df[minmax_df['liquidation_cascades'] == 1].index,
minmax_df[minmax_df['liquidation_cascades'] == 1]['futures_btc_close_price'],
color='red', s=30, label='Liquidation Cascades') # Increased 's' value for larger dots
plt.title('BTC Price and spot_eth_dollar_volume with Liquidation Cascades')
plt.xlabel('Time')
plt.ylabel('Scaled Price')
plt.legend(loc='upper center', bbox_to_anchor=(0.5, -0.1), fancybox=True, shadow=True, ncol=3)
plt.show()
The green bars representing Spot ETH Dollar Volume exhibit significant fluctuations, yet their influence on BTC price appears minimal. Although there are instances where spikes in dollar volume align with BTC price changes, the overall impact on the BTC price trend is weak. This suggests that while there is some interaction between ETH trading volume and BTC price movements, it is not a dominant factor, and its contribution to major price shifts is limited, as indicated by the low positive correlation.
plt.figure(figsize=(14, 8))
plt.plot(minmax_df.index, minmax_df['futures_btc_close_price'], color='blue', linewidth=1, label='futures_btc_close_price')
plt.plot(minmax_df.index, minmax_df['spot_eth_total_trades'], color='green', linewidth=1, alpha=0.5, label='spot_eth_total_trades')
plt.scatter(minmax_df[minmax_df['liquidation_cascades'] == 1].index,
minmax_df[minmax_df['liquidation_cascades'] == 1]['futures_btc_close_price'],
color='red', s=30, label='Liquidation Cascades') # Increased 's' value for larger dots
plt.title('BTC Price and spot_eth_total_trades with Liquidation Cascades')
plt.xlabel('Time')
plt.ylabel('Scaled Price')
plt.legend(loc='upper center', bbox_to_anchor=(0.5, -0.1), fancybox=True, shadow=True, ncol=3)
plt.show()
The green bars representing total ETH transactions frequently exhibit fluctuations, especially during periods of high BTC price volatility. While there are instances where increases in ETH trading volume coincide with BTC price movements, the overall impact on BTC’s price appears to be minimal. The weak positive correlation indicates that there is a connection between ETH trading activity and BTC price changes, but it suggests that ETH trading activity is not a significant determinant of BTC price fluctuations.
plt.figure(figsize=(14, 8))
plt.plot(minmax_df.index, minmax_df['futures_btc_close_price'], color='blue', linewidth=1, label='futures_btc_close_price')
plt.plot(minmax_df.index, minmax_df['futures_eth_total_trades'], color='green', linewidth=1, alpha=0.5, label='futures_eth_total_trades')
plt.scatter(minmax_df[minmax_df['liquidation_cascades'] == 1].index,
minmax_df[minmax_df['liquidation_cascades'] == 1]['futures_btc_close_price'],
color='red', s=30, label='Liquidation Cascades') # Increased 's' value for larger dots
plt.title('BTC Price and futures_eth_total_trades with Liquidation Cascades')
plt.xlabel('Time')
plt.ylabel('Scaled Price')
plt.legend(loc='upper center', bbox_to_anchor=(0.5, -0.1), fancybox=True, shadow=True, ncol=3)
plt.show()
The green bars indicate frequent fluctuations in the total number of ETH trades, especially during periods of high BTC price volatility. Despite these fluctuations, the overall impact of ETH total trades on BTC price appears minimal, as indicated by the weak negative correlation. The presence of liquidation cascades seems to align with periods of increased trading activity, but the overall influence of ETH total trades on BTC price movements remains limited.
plt.figure(figsize=(14, 8))
plt.plot(minmax_df.index, minmax_df['futures_btc_close_price'], color='blue', linewidth=1, label='futures_btc_close_price')
plt.plot(minmax_df.index, minmax_df['spot_eth_coin_volume'], color='green', linewidth=1, alpha=0.5, label='spot_eth_coin_volume')
plt.scatter(minmax_df[minmax_df['liquidation_cascades'] == 1].index,
minmax_df[minmax_df['liquidation_cascades'] == 1]['futures_btc_close_price'],
color='red', s=30, label='Liquidation Cascades') # Increased 's' value for larger dots
plt.title('BTC Price and spot_eth_coin_volume with Liquidation Cascades')
plt.xlabel('Time')
plt.ylabel('Scaled Price')
plt.legend(loc='upper center', bbox_to_anchor=(0.5, -0.1), fancybox=True, shadow=True, ncol=3)
plt.show()
The green bars representing Spot ETH Coin Volume show consistent fluctuations, especially during periods of increased BTC price volatility. Despite these fluctuations, the overall impact of ETH coin volume on BTC price appears to be minimal, as indicated by the weak negative correlation. The alignment of liquidation cascades with spikes in ETH coin volume suggests some connection between increased trading activity and market stress, but the overall contribution of ETH coin volume to BTC price movements remains limited.
plt.figure(figsize=(14, 8))
plt.plot(minmax_df.index, minmax_df['futures_btc_close_price'], color='blue', linewidth=1, label='futures_btc_close_price')
plt.plot(minmax_df.index, minmax_df['futures_btc_coin_volume'], color='green', linewidth=1, alpha=0.5, label='futures_btc_coin_volume')
plt.scatter(minmax_df[minmax_df['liquidation_cascades'] == 1].index,
minmax_df[minmax_df['liquidation_cascades'] == 1]['futures_btc_close_price'],
color='red', s=30, label='Liquidation Cascades') # Increased 's' value for larger dots
plt.title('BTC Price and futures_btc_coin_volume with Liquidation Cascades')
plt.xlabel('Time')
plt.ylabel('Scaled Price')
plt.legend(loc='upper center', bbox_to_anchor=(0.5, -0.1), fancybox=True, shadow=True, ncol=3)
plt.show()
The graph shows the relationship between the scaled futures BTC close price (blue line) and the futures BTC coin volume (green bars), with liquidation cascades marked by red dots. The BTC price fluctuates significantly, reflecting market volatility. The green volume bars spike during periods of high activity, often coinciding with sharp price movements. Liquidation cascades typically occur during extreme price changes, indicating a strong link between high volatility, trading volume, and these events.
plt.figure(figsize=(14, 8))
plt.plot(minmax_df.index, minmax_df['futures_btc_close_price'], color='blue', linewidth=1, label='futures_btc_close_price')
plt.plot(minmax_df.index, minmax_df['futures_eth_coin_volume'], color='green', linewidth=1, alpha=0.5, label='futures_eth_coin_volume')
plt.scatter(minmax_df[minmax_df['liquidation_cascades'] == 1].index,
minmax_df[minmax_df['liquidation_cascades'] == 1]['futures_btc_close_price'],
color='red', s=30, label='Liquidation Cascades') # Increased 's' value for larger dots
plt.title('BTC Price and futures_eth_coin_volume with Liquidation Cascades')
plt.xlabel('Time')
plt.ylabel('Scaled Price')
plt.legend(loc='upper center', bbox_to_anchor=(0.5, -0.1), fancybox=True, shadow=True, ncol=3)
plt.show()
The graph shows the scaled BTC futures price (blue) and ETH coin volume (green) alongside liquidation cascades (red dots). The BTC price fluctuates significantly, indicating market volatility. Peaks in ETH volume often align with sharp BTC price movements. Liquidation cascades typically occur during or after these significant price changes, highlighting the link between volatility, ETH volume, and liquidation events.
plt.figure(figsize=(14, 8))
plt.plot(minmax_df.index, minmax_df['futures_btc_close_price'], color='blue', linewidth=1, label='futures_btc_close_price')
plt.plot(minmax_df.index, minmax_df['spot_btc_total_trades'], color='green', linewidth=1, alpha=0.5, label='spot_btc_total_trades')
plt.scatter(minmax_df[minmax_df['liquidation_cascades'] == 1].index,
minmax_df[minmax_df['liquidation_cascades'] == 1]['futures_btc_close_price'],
color='red', s=30, label='Liquidation Cascades') # Increased 's' value for larger dots
plt.title('BTC Price and spot_btc_total_trades with Liquidation Cascades')
plt.xlabel('Time')
plt.ylabel('Scaled Price')
plt.legend(loc='upper center', bbox_to_anchor=(0.5, -0.1), fancybox=True, shadow=True, ncol=3)
plt.show()
The green bars, representing Spot BTC Total Trades, show noticeable fluctuations, particularly during periods of significant BTC price movements. The moderate negative correlation suggests that an increase in total BTC trades is often associated with a decrease in BTC price. The alignment of liquidation cascades with peaks in total trades highlights periods of intense market activity, indicating that total trades play a role during times of market stress, although their overall contribution to the model is relatively low.
plt.figure(figsize=(14, 8))
plt.plot(minmax_df.index, minmax_df['futures_btc_close_price'], color='blue', linewidth=1, label='futures_btc_close_price')
plt.plot(minmax_df.index, minmax_df['spot_btc_coin_volume'], color='green', linewidth=1, alpha=0.5, label='spot_btc_coin_volume')
plt.scatter(minmax_df[minmax_df['liquidation_cascades'] == 1].index,
minmax_df[minmax_df['liquidation_cascades'] == 1]['futures_btc_close_price'],
color='red', s=30, label='Liquidation Cascades') # Increased 's' value for larger dots
plt.title('BTC Price and spot_btc_coin_volume with Liquidation Cascades')
plt.xlabel('Time')
plt.ylabel('Scaled Price')
plt.legend(loc='upper center', bbox_to_anchor=(0.5, -0.1), fancybox=True, shadow=True, ncol=3)
plt.show()
Fluctuations in spot BTC coin volume seem to have a noticeable impact on the futures BTC closing price. Volume spikes are particularly evident during price declines. The liquidation cascades (red dots) coincide with sharp price drops, further highlighting the interaction between volume and price movements.
plt.figure(figsize=(14, 8))
plt.plot(minmax_df.index, minmax_df['futures_btc_close_price'], color='blue', linewidth=1, label='futures_btc_close_price')
plt.plot(minmax_df.index, minmax_df['futures_eth_CVD'], color='green', linewidth=1, alpha=0.5, label='futures_eth_CVD')
plt.scatter(minmax_df[minmax_df['liquidation_cascades'] == 1].index,
minmax_df[minmax_df['liquidation_cascades'] == 1]['futures_btc_close_price'],
color='red', s=30, label='Liquidation Cascades') # Increased 's' value for larger dots
plt.title('BTC Price and futures_eth_CVD with Liquidation Cascades')
plt.xlabel('Time')
plt.ylabel('Scaled Price')
plt.legend(loc='upper center', bbox_to_anchor=(0.5, -0.1), fancybox=True, shadow=True, ncol=3)
plt.show()
The ETH CVD exhibits a downward trend, suggesting a decreasing cumulative volume delta for ETH over this period. As ETH CVD decreases, there are multiple instances where BTC prices also show a decline, particularly around liquidation events, which are prominently marked in red. This indicates that the drop in ETH CVD might be correlated with BTC price movements, especially during periods of high market stress.
import matplotlib.pyplot as plt
plt.figure(figsize=(14, 8))
plt.plot(minmax_df.index, minmax_df['futures_btc_close_price'], color='blue', linewidth=1, label='futures_btc_close_price')
plt.plot(minmax_df.index, minmax_df['futures_btc_CVD'], color='green', linewidth=1, alpha=0.5, label='futures_btc_CVD')
plt.scatter(minmax_df[minmax_df['liquidation_cascades'] == 1].index,
minmax_df[minmax_df['liquidation_cascades'] == 1]['futures_btc_close_price'],
color='red', s=30, label='Liquidation Cascades') # Increased 's' value for larger dots
plt.title('BTC Price and futures_btc_CVD with Liquidation Cascades')
plt.xlabel('Time')
plt.ylabel('Scaled Price')
plt.legend(loc='upper center', bbox_to_anchor=(0.5, -0.1), fancybox=True, shadow=True, ncol=3)
plt.show()
There is a noticeable inverse relationship between the BTC CVD and the BTC futures close price, especially evident from mid-2022 onward. As BTC CVD decreases, the BTC futures price generally increases, highlighting the significant negative correlation observed in the analysis. Liquidation cascades, marked by red dots, occur during both price increases and decreases, often aligning with sharp price movements.
plt.figure(figsize=(14, 8))
plt.plot(minmax_df.index, minmax_df['futures_btc_close_price'], color='blue', linewidth=1, label='futures_btc_close_price')
plt.plot(minmax_df.index, minmax_df['futures_btc_coin_open_interest_close'], color='green', linewidth=1, alpha=0.5, label='futures_btc_coin_open_interest_close')
plt.scatter(minmax_df[minmax_df['liquidation_cascades'] == 1].index,
minmax_df[minmax_df['liquidation_cascades'] == 1]['futures_btc_close_price'],
color='red', s=30, label='Liquidation Cascades') # Increased 's' value for larger dots
plt.title('BTC Price and futures_btc_coin_open_interest_close with Liquidation Cascades')
plt.xlabel('Time')
plt.ylabel('Scaled Price')
plt.legend(loc='upper center', bbox_to_anchor=(0.5, -0.1), fancybox=True, shadow=True, ncol=3)
plt.show()
The BTC Coin Open Interest Close generally exhibits an inverse relationship with the BTC closing price, indicating that increasing open interest close is typically associated with price declines.
plt.figure(figsize=(14, 8))
plt.plot(minmax_df.index, minmax_df['futures_btc_close_price'], color='blue', linewidth=1, label='futures_btc_close_price')
plt.plot(minmax_df.index, minmax_df['futures_btc_liquidations_coin_volume'], color='green', linewidth=1, alpha=0.5, label='futures_btc_liquidations_coin_volume')
plt.scatter(minmax_df[minmax_df['liquidation_cascades'] == 1].index,
minmax_df[minmax_df['liquidation_cascades'] == 1]['futures_btc_close_price'],
color='red', s=30, label='Liquidation Cascades') # Increased 's' value for larger dots
plt.title('BTC Price and futures_btc_liquidations_coin_volume with Liquidation Cascades')
plt.xlabel('Time')
plt.ylabel('Scaled Price')
plt.legend(loc='upper center', bbox_to_anchor=(0.5, -0.1), fancybox=True, shadow=True, ncol=3)
plt.show()
This graph illustrates the relationship between futures_btc_close_price (blue line), futures_btc_liquidations_coin_volume (green line), and liquidation_cascades (red dots). The green line, representing futures_btc_liquidations_coin_volume, typically spikes during significant drops in the futures_btc_close_price, indicating periods of high liquidation activity. During these times, the red dots, which represent liquidation_cascades, are densely clustered in areas where the futures_btc_close_price is rapidly declining. Conversely, during periods where the futures_btc_liquidations_coin_volume remains relatively flat, the price fluctuations are less pronounced, and fewer liquidation_cascades are observed. This suggests that in times of lower liquidation activity, price movements tend to be more stable.
plt.figure(figsize=(14, 8))
plt.plot(minmax_df.index, minmax_df['futures_btc_close_price'], color='blue', linewidth=1, label='futures_btc_close_price')
plt.plot(minmax_df.index, minmax_df['btc_etf'], color='green', linewidth=1, alpha=0.5, label='btc_etf')
plt.scatter(minmax_df[minmax_df['liquidation_cascades'] == 1].index,
minmax_df[minmax_df['liquidation_cascades'] == 1]['futures_btc_close_price'],
color='red', s=30, label='Liquidation Cascades') # Increased 's' value for larger dots
plt.title('BTC Price and btc_etf with Liquidation Cascades')
plt.xlabel('Time')
plt.ylabel('Scaled Price')
plt.legend(loc='upper center', bbox_to_anchor=(0.5, -0.1), fancybox=True, shadow=True, ncol=3)
plt.show()
The green line representing the BTC ETF shows very minimal movement, indicating that its impact on the futures BTC closing price is negligible. There is no significant correlation between the BTC ETF and the futures BTC closing price, as both seem to move independently of each other. The red dots, indicating liquidation cascades, appear mainly during sharp declines in the blue line (futures BTC closing price), but there is no clear interaction between these events and the BTC ETF. This suggests that the BTC ETF does not play a significant role in influencing liquidation events or the overall price movement of BTC futures.
plt.figure(figsize=(14, 8))
plt.plot(minmax_df.index, minmax_df['futures_btc_close_price'], color='blue', linewidth=1, label='futures_btc_close_price')
plt.plot(minmax_df.index, minmax_df['eth_etf'], color='green', linewidth=1, alpha=0.5, label='eth_etf')
plt.scatter(minmax_df[minmax_df['liquidation_cascades'] == 1].index,
minmax_df[minmax_df['liquidation_cascades'] == 1]['futures_btc_close_price'],
color='red', s=30, label='Liquidation Cascades') # Increased 's' value for larger dots
plt.title('BTC Price and eth_etf with Liquidation Cascades')
plt.xlabel('Time')
plt.ylabel('Scaled Price')
plt.legend(loc='upper center', bbox_to_anchor=(0.5, -0.1), fancybox=True, shadow=True, ncol=3)
plt.show()
The green line representing the ETH ETF exhibits minimal movement and does not show a clear correlation with the blue line, which represents the futures BTC closing price. The ETH ETF seems to have little to no impact on the futures BTC price. The red dots, indicating liquidation cascades, are predominantly observed during periods of sharp declines in the blue line, but these events do not appear to be influenced by the ETH ETF. Overall, the ETH ETF does not significantly affect the BTC futures market or trigger liquidation cascades.
plt.figure(figsize=(14, 8))
plt.plot(minmax_df.index, minmax_df['futures_btc_close_price'], color='blue', linewidth=1, label='futures_btc_close_price')
plt.scatter(minmax_df[minmax_df['liquidation_cascades'] == 1].index,
minmax_df[minmax_df['liquidation_cascades'] == 1]['futures_btc_close_price'],
color='red', s=30, label='Liquidation Cascades') # Increased 's' value for larger dots
plt.title('BTC Price and eth_etf with Liquidation Cascades')
plt.xlabel('Time')
plt.ylabel('Scaled Price')
plt.legend(loc='upper center', bbox_to_anchor=(0.5, -0.1), fancybox=True, shadow=True, ncol=3)
plt.show()
This graph displays the relationship between the BTC futures close price (blue line) and liquidation cascades (red dots), along with the ETH ETF (which isn’t visible in this version). The blue line represents the BTC futures close price over time, and the red dots indicate instances of liquidation cascades. Notably, these red dots cluster around significant drops in the blue line, demonstrating a direct correlation between sharp price declines and increased liquidation events. The intensity and frequency of the red dots during these downturns highlight the cascading effect where one liquidation can trigger subsequent liquidations, leading to more pronounced price drops. In periods where the blue line is more stable or rising, the absence of red dots indicates fewer liquidation events, suggesting that the market is less stressed.
plt.figure(figsize=(14, 8))
plt.plot(minmax_df.index, minmax_df['futures_btc_close_price'], color='blue', linewidth=1, label='futures_btc_close_price')
plt.plot(minmax_df.index, minmax_df['futures_btc_funding_rate'], color='green', linewidth=1, alpha=0.5, label='futures_btc_funding_rate')
plt.scatter(minmax_df[minmax_df['liquidation_cascades'] == 1].index,
minmax_df[minmax_df['liquidation_cascades'] == 1]['futures_btc_close_price'],
color='red', s=30, label='Liquidation Cascades') # Increased 's' value for larger dots
plt.title('BTC Price and futures_btc_funding_rate with Liquidation Cascades')
plt.xlabel('Time')
plt.ylabel('Scaled Price')
plt.legend(loc='upper center', bbox_to_anchor=(0.5, -0.1), fancybox=True, shadow=True, ncol=3)
plt.show()
The green line shows that when the funding rate is highly positive, there are often price drops, leading to liquidation events indicated by the red dots. As the funding rate fluctuates, it appears to correlate with significant price movements; a high positive funding rate can indicate an overheated market, often followed by sharp corrections and liquidations. Conversely, a negative funding rate tends to stabilize or slightly increase the BTC price, as seen in the periods with fewer red dots and steadier price movements.
plt.figure(figsize=(14, 8))
plt.plot(minmax_df.index, minmax_df['futures_btc_close_price'], color='blue', linewidth=1, label='futures_btc_close_price')
plt.plot(minmax_df.index, minmax_df['futures_eth_funding_rate'], color='green', linewidth=1, alpha=0.5, label='futures_eth_funding_rate')
plt.scatter(minmax_df[minmax_df['liquidation_cascades'] == 1].index,
minmax_df[minmax_df['liquidation_cascades'] == 1]['futures_btc_close_price'],
color='red', s=30, label='Liquidation Cascades') # Increased 's' value for larger dots
plt.title('BTC Price and futures_eth_funding_rate with Liquidation Cascades')
plt.xlabel('Time')
plt.ylabel('Scaled Price')
plt.legend(loc='upper center', bbox_to_anchor=(0.5, -0.1), fancybox=True, shadow=True, ncol=3)
plt.show()
The green line, representing the funding rate for Ethereum futures, demonstrates a pattern where high positive funding rates often coincide with sharp BTC price drops, leading to increased liquidation events. This can be observed particularly in the earlier periods of the graph where the spikes in the funding rate are followed by significant declines in BTC price and clustering of red dots. Conversely, periods with a neutral or negative funding rate tend to show more stability in BTC prices with fewer liquidation events, suggesting a stabilizing effect when the funding rate is low or negative.
plt.figure(figsize=(14, 8))
plt.plot(minmax_df.index, minmax_df['futures_btc_close_price'], color='blue', linewidth=1, label='futures_btc_close_price')
plt.plot(minmax_df.index, minmax_df['btc_futures_to_spot'], color='green', linewidth=1, alpha=0.5, label='btc_futures_to_spot')
plt.scatter(minmax_df[minmax_df['liquidation_cascades'] == 1].index,
minmax_df[minmax_df['liquidation_cascades'] == 1]['futures_btc_close_price'],
color='red', s=30, label='Liquidation Cascades') # Increased 's' value for larger dots
plt.title('BTC Price and btc_futures_to_spot with Liquidation Cascades')
plt.xlabel('Time')
plt.ylabel('Scaled Price')
plt.legend(loc='upper center', bbox_to_anchor=(0.5, -0.1), fancybox=True, shadow=True, ncol=3)
plt.show()
The green line representing btc_futures_to_spot shows occasional spikes and dips, indicating moments where the futures price deviates significantly from the spot price. These deviations tend to occur before or during sharp declines in the futures_btc_close_price, as indicated by the blue line. The red dots marking liquidation_cascades are often clustered around these periods of deviation, suggesting that large discrepancies between futures and spot prices can precede or coincide with liquidation events, which are typically associated with sharp price drops. Overall, btc_futures_to_spot appears to be a leading indicator of market instability, with significant implications for both price movements and liquidation cascades.
plt.figure(figsize=(14, 8))
plt.plot(minmax_df.index, minmax_df['futures_btc_close_price'], color='blue', linewidth=1, label='futures_btc_close_price')
plt.plot(minmax_df.index, minmax_df['eth_futures_to_spot'], color='green', linewidth=1, alpha=0.5, label='eth_futures_to_spot')
plt.scatter(minmax_df[minmax_df['liquidation_cascades'] == 1].index,
minmax_df[minmax_df['liquidation_cascades'] == 1]['futures_btc_close_price'],
color='red', s=30, label='Liquidation Cascades') # Increased 's' value for larger dots
plt.title('BTC Price and eth_futures_to_spot with Liquidation Cascades')
plt.xlabel('Time')
plt.ylabel('Scaled Price')
plt.legend(loc='upper center', bbox_to_anchor=(0.5, -0.1), fancybox=True, shadow=True, ncol=3)
plt.show()
The green line represents the eth_futures_to_spot ratio, showing occasional spikes where the futures price diverges significantly from the spot price. These divergences are often followed or accompanied by sharp drops in the futures_btc_close_price (blue line). The red dots, representing liquidation_cascades, tend to cluster around these periods of divergence, indicating that large differences between Ethereum futures and spot prices may trigger liquidation events, especially during volatile market conditions. This suggests that the eth_futures_to_spot ratio can be a significant indicator of potential instability in the market, affecting both price movements and the likelihood of liquidation cascades.
Futures ETH price is very closely related to Futures BTC Price. Market movements generally affect both assets in the same way. Temporal market movements first move BTC, and then ETH prices adapt to this movement. Although this situation is seen as a short-term opportunity for investors, sometimes the opposite scenario may occur or both assets may react at the same time. In a situation where there is such intense fit and mobility, using ETH price as an independent variable in future BTC price prediction may cause overfitting in future modeling. For this reason, removing the futures ETH price from the dataset is suitable for making predictions that are more compatible with the real world.
minmax_scaled_price = minmax_df.drop(columns='futures_eth_close_price', inplace=False)
minmax_scaled_price.to_csv('minmax_scaled_price.csv', index=False)
prediction_scaled_price = prediction_df.drop(columns='futures_eth_close_price', inplace=False)
prediction_scaled_price.to_csv('prediction_scaled_price.csv', index=False)
It is removed from the dataset because its relationship with the target value 'futures btc_close_price', hence 'liquidation cascades', is quite weak compared to other variables.
columns_to_delete = ['futures_btc_dollar_volume', 'futures_btc_liquidations_coin_volume',
'spot_eth_dollar_volume', 'spot_eth_total_trades', 'futures_eth_total_trades',
'spot_eth_coin_volume', 'futures_eth_coin_volume', 'spot_btc_total_trades',
'futures_btc_liquidations_coin_volume', 'futures_eth_liquidations_coin_volume',
'futures_eth_dollar_volume', 'futures_btc_total_trades', 'spot_btc_CVD', 'spot_eth_CVD',
'spot_btc_dollar_volume']
prediction_df= prediction_df.drop(columns = columns_to_delete, errors='ignore')
minmax_df= minmax_df.drop(columns = columns_to_delete, errors='ignore')
minmax_df.to_csv('minmax_scaled_price.csv', index=False)
prediction_df.to_csv('prediction_scaled_price.csv', index=False)
Although the correlation and statistical significance of both BTC and ETH ETF events with the target feature ‘futures_btc_close_price’ are close to zero, they are events that investors should consider in the context of liquidation cascades. As observed in the above charts, liquidation cascades in market prices have generally been observed during or after ETF events. While these cascades are generally downward, some ETF events have also been followed by sudden upward spikes (short position liquidations). Investors should closely monitor their positions during and before these events and evaluate them in conjunction with other metrics. For these critical reasons, ETF event date data should not be removed from the dataset. It can serve as an important warning signal for future models related to liquidation cascades.
cor_btc = prediction_df
correlation_matrix = cor_btc.corr()
target_correlation = correlation_matrix['futures_btc_close_price'].sort_values(ascending=False)
with pd.option_context('display.max_rows', None):
print(target_correlation)
futures_btc_close_price 1.000000 futures_eth_close_price 0.822552 btc_futures_to_spot 0.386361 futures_btc_funding_rate 0.320078 futures_eth_funding_rate 0.286436 eth_futures_to_spot 0.282220 liquidation_cascades 0.221141 volatility 0.087102 log_returns 0.010335 eth_etf 0.006500 btc_etf 0.005272 futures_btc_coin_volume -0.224072 futures_eth_coin_open_interest_close -0.398240 spot_btc_coin_volume -0.414169 futures_eth_CVD -0.426634 futures_btc_CVD -0.434872 futures_btc_coin_open_interest_close -0.621279 Name: futures_btc_close_price, dtype: float64
correlation_matrix = prediction_df.corr()
plt.figure(figsize=(12, 10))
sns.heatmap(correlation_matrix, annot=True, fmt=".2f", cmap='coolwarm', linewidths=0.5)
plt.title('Correlation Heatmap of prediction_df')
plt.show()
In summary of Feature Selection, there are only independent variables left in the dataset that are thought to not make predictions disconnected from the real world and that can help predict Bitcoin's futures price consistently in the future and are important for liquidation cascades.
The effect of some variables on the target variable in the correlation map is very small. These are the binary features 'eth_etf', 'btc_etf' and 'liquidation_cascades', which are particularly important for liquidation cascades but are not significantly meaningful for price prediction.
Dropping features are detected by their weakness regarding to analysis and the graphs
columns_to_delete = ['futures_btc_dollar_volume', 'futures_btc_liquidations_coin_volume',
'spot_eth_dollar_volume', 'spot_eth_total_trades', 'futures_eth_total_trades',
'spot_eth_coin_volume', 'futures_eth_coin_volume', 'spot_btc_total_trades',
'futures_btc_liquidations_coin_volume', 'futures_eth_liquidations_coin_volume',
'futures_eth_dollar_volume', 'futures_btc_total_trades', 'spot_btc_CVD', 'spot_eth_CVD',
'spot_btc_dollar_volume']
prediction_df= prediction_df.drop(columns = columns_to_delete, errors='ignore')
minmax_df= minmax_df.drop(columns = columns_to_delete, errors='ignore')
minmax_df.to_csv('minmax_scaled_log_returns.csv', index=False)
prediction_df.to_csv('prediction_scaled_log_returns.csv', index=False)
log_returns = pd.read_csv("prediction_scaled_log_returns.csv")
price = pd.read_csv("prediction_scaled_price.csv")
log_returns.columns
Index(['spot_btc_coin_volume', 'futures_btc_close_price',
'futures_btc_coin_volume', 'futures_btc_coin_open_interest_close',
'futures_btc_funding_rate', 'futures_btc_CVD',
'futures_eth_close_price', 'futures_eth_coin_open_interest_close',
'futures_eth_funding_rate', 'futures_eth_CVD', 'eth_etf', 'btc_etf',
'liquidation_cascades', 'btc_futures_to_spot', 'eth_futures_to_spot',
'log_returns', 'volatility'],
dtype='object')
price.columns
Index(['spot_btc_coin_volume', 'futures_btc_close_price',
'futures_btc_coin_volume', 'futures_btc_coin_open_interest_close',
'futures_btc_funding_rate', 'futures_btc_CVD',
'futures_eth_close_price', 'futures_eth_coin_open_interest_close',
'futures_eth_funding_rate', 'futures_eth_CVD', 'eth_etf', 'btc_etf',
'liquidation_cascades', 'btc_futures_to_spot', 'eth_futures_to_spot',
'log_returns', 'volatility'],
dtype='object')
prediction_df is reloaded and its index is set to date time.
prediction_df = pd.read_csv("prediction_scaled_log_returns.csv")
prediction_df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 31339 entries, 0 to 31338 Data columns (total 17 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 spot_btc_coin_volume 31339 non-null float64 1 futures_btc_close_price 31339 non-null float64 2 futures_btc_coin_volume 31339 non-null float64 3 futures_btc_coin_open_interest_close 31339 non-null float64 4 futures_btc_funding_rate 31339 non-null float64 5 futures_btc_CVD 31339 non-null float64 6 futures_eth_close_price 31339 non-null float64 7 futures_eth_coin_open_interest_close 31339 non-null float64 8 futures_eth_funding_rate 31339 non-null float64 9 futures_eth_CVD 31339 non-null float64 10 eth_etf 31339 non-null int64 11 btc_etf 31339 non-null int64 12 liquidation_cascades 31339 non-null int64 13 btc_futures_to_spot 31339 non-null float64 14 eth_futures_to_spot 31339 non-null float64 15 log_returns 31339 non-null float64 16 volatility 31339 non-null float64 dtypes: float64(14), int64(3) memory usage: 4.1 MB
start_date = '2021-01-02 00:00:00'
end_date = '2024-07-30 18:00:00'
freq = 'H'
datetime_index = pd.date_range(start=start_date, end=end_date, freq=freq)
prediction_df = prediction_df.set_index(datetime_index)
prediction_df.info()
<class 'pandas.core.frame.DataFrame'> DatetimeIndex: 31339 entries, 2021-01-02 00:00:00 to 2024-07-30 18:00:00 Freq: H Data columns (total 17 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 spot_btc_coin_volume 31339 non-null float64 1 futures_btc_close_price 31339 non-null float64 2 futures_btc_coin_volume 31339 non-null float64 3 futures_btc_coin_open_interest_close 31339 non-null float64 4 futures_btc_funding_rate 31339 non-null float64 5 futures_btc_CVD 31339 non-null float64 6 futures_eth_close_price 31339 non-null float64 7 futures_eth_coin_open_interest_close 31339 non-null float64 8 futures_eth_funding_rate 31339 non-null float64 9 futures_eth_CVD 31339 non-null float64 10 eth_etf 31339 non-null int64 11 btc_etf 31339 non-null int64 12 liquidation_cascades 31339 non-null int64 13 btc_futures_to_spot 31339 non-null float64 14 eth_futures_to_spot 31339 non-null float64 15 log_returns 31339 non-null float64 16 volatility 31339 non-null float64 dtypes: float64(14), int64(3) memory usage: 4.3 MB
import pandas as pd
import matplotlib.pyplot as plt
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
# Columns to be used (excluding binary columns)
columns_to_use = [
'spot_btc_coin_volume', 'futures_btc_coin_volume',
'futures_btc_coin_open_interest_close', 'futures_btc_funding_rate',
'futures_btc_CVD', 'futures_eth_coin_open_interest_close',
'futures_eth_funding_rate', 'futures_eth_CVD',
'btc_futures_to_spot', 'eth_futures_to_spot', 'log_returns',
'volatility'
]
# Filtering the data for ACF and PACF analysis
data_to_analyze = prediction_df[columns_to_use]
# Plotting ACF and PACF graphs for each column
for column in data_to_analyze.columns:
print(f'ACF and PACF Plots - {column}')
# ACF Plot
plt.figure(figsize=(10, 5))
plot_acf(data_to_analyze[column], lags=1440)
plt.title(f'ACF for {column}')
plt.show()
# PACF Plot
plt.figure(figsize=(10, 5))
plot_pacf(data_to_analyze[column], lags=1440)
plt.title(f'PACF for {column}')
plt.show()
ACF and PACF Plots - spot_btc_coin_volume
<Figure size 1000x500 with 0 Axes>
<Figure size 1000x500 with 0 Axes>
ACF and PACF Plots - futures_btc_coin_volume
<Figure size 1000x500 with 0 Axes>
<Figure size 1000x500 with 0 Axes>
ACF and PACF Plots - futures_btc_coin_open_interest_close
<Figure size 1000x500 with 0 Axes>
<Figure size 1000x500 with 0 Axes>
ACF and PACF Plots - futures_btc_funding_rate
<Figure size 1000x500 with 0 Axes>
<Figure size 1000x500 with 0 Axes>
ACF and PACF Plots - futures_btc_CVD
<Figure size 1000x500 with 0 Axes>
<Figure size 1000x500 with 0 Axes>
ACF and PACF Plots - futures_eth_coin_open_interest_close
<Figure size 1000x500 with 0 Axes>
<Figure size 1000x500 with 0 Axes>
ACF and PACF Plots - futures_eth_funding_rate
<Figure size 1000x500 with 0 Axes>
<Figure size 1000x500 with 0 Axes>
ACF and PACF Plots - futures_eth_CVD
<Figure size 1000x500 with 0 Axes>
<Figure size 1000x500 with 0 Axes>
ACF and PACF Plots - btc_futures_to_spot
<Figure size 1000x500 with 0 Axes>
<Figure size 1000x500 with 0 Axes>
ACF and PACF Plots - eth_futures_to_spot
<Figure size 1000x500 with 0 Axes>
<Figure size 1000x500 with 0 Axes>
ACF and PACF Plots - log_returns
<Figure size 1000x500 with 0 Axes>
<Figure size 1000x500 with 0 Axes>
ACF and PACF Plots - volatility
<Figure size 1000x500 with 0 Axes>
<Figure size 1000x500 with 0 Axes>
The ACF (Autocorrelation Function) and PACF (Partial Autocorrelation Function) plots provide critical insights into the temporal dependencies within the variables of the dataset. For most variables, including futures-related features, there is a strong positive autocorrelation at initial lags, which gradually decreases over time. This pattern suggests that recent past values significantly influence current values. The PACF plots reinforce this observation, indicating that most predictive information is contained within the first few lags. However, as the lag increases, the influence diminishes, implying that distant past values have less predictive power. Therefore, using lower to moderate lag values (e.g., t-1, t-2, t-3) in predictive models would likely yield the best results.
Specific analysis of the ACF and PACF plots for log returns and volatility reveals distinct patterns. For log returns, there is a high autocorrelation at lag 0, followed by nearly zero autocorrelation at subsequent lags, indicating a lack of memory in the data. In contrast, the ACF plot for volatility shows a slower decay, suggesting that volatility is more persistent over time. The PACF plot for volatility supports this, with significant autocorrelations at multiple lags. These observations highlight the importance of adjusting lag structures depending on the variable, with recent past values being critical for log returns and a broader range of lagged values being beneficial for volatility predictions.
Given the high computational cost and time involved in conducting Granger causality tests on hourly data, we have chosen to re-sample our dataset to daily frequency. This approach balances the need for robust causality analysis with the practical constraints of computational resources, allowing us to effectively apply Granger causality to identify the key drivers of liquidation cascades in the cryptocurrency market.
predictions_daily = prediction_df.resample('D').last()
# log_returns and volatility features have to be recalculated. Dataset's frequency was changed as daily,
# for that reason, log_returns and volatility have to be calculated daily.
predictions_daily = predictions_daily.drop(columns=['log_returns', 'volatility'])
predictions_daily['log_returns'] = np.log(predictions_daily['futures_btc_close_price'] / predictions_daily['futures_btc_close_price'].shift(1))
predictions_daily['volatility'] = predictions_daily['log_returns'].rolling(window=10).std()
predictions_daily.info()
<class 'pandas.core.frame.DataFrame'> DatetimeIndex: 1306 entries, 2021-01-02 to 2024-07-30 Freq: D Data columns (total 17 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 spot_btc_coin_volume 1306 non-null float64 1 futures_btc_close_price 1306 non-null float64 2 futures_btc_coin_volume 1306 non-null float64 3 futures_btc_coin_open_interest_close 1306 non-null float64 4 futures_btc_funding_rate 1306 non-null float64 5 futures_btc_CVD 1306 non-null float64 6 futures_eth_close_price 1306 non-null float64 7 futures_eth_coin_open_interest_close 1306 non-null float64 8 futures_eth_funding_rate 1306 non-null float64 9 futures_eth_CVD 1306 non-null float64 10 eth_etf 1306 non-null int64 11 btc_etf 1306 non-null int64 12 liquidation_cascades 1306 non-null int64 13 btc_futures_to_spot 1306 non-null float64 14 eth_futures_to_spot 1306 non-null float64 15 log_returns 1305 non-null float64 16 volatility 1296 non-null float64 dtypes: float64(14), int64(3) memory usage: 183.7 KB
predictions_daily = predictions_daily.dropna()
# Features to be excluded: 'eth_etf', 'btc_etf', 'liquidation_cascades'
excluded_features = ['eth_etf', 'btc_etf', 'liquidation_cascades']
target = 'log_returns'
# Selecting the features to include in the test
features = [col for col in predictions_daily.columns if col not in excluded_features and col != target]
# Granger Causality Test
maxlag = 60 # 60-day lag (approximately 2 months)
# Dictionary to store the results
results = {}
# Performing Granger Causality Test for each feature
for feature in features:
print(f"\nTesting Granger Causality between {feature} and {target}")
test_result = grangercausalitytests(predictions_daily[[target, feature]], maxlag=maxlag, verbose=False)
results[feature] = test_result
# Reviewing the results
for feature, result in results.items():
print(f"\nFeature: {feature}")
for lag in result:
print(f"Lag {lag} p-value: {result[lag][0]['ssr_ftest'][1]:.4f}") # F-test p-value
Testing Granger Causality between spot_btc_coin_volume and log_returns
/Users/alperenunal/anaconda3/lib/python3.11/site-packages/statsmodels/tsa/stattools.py:1488: FutureWarning: verbose is deprecated since functions should not print results warnings.warn(
Testing Granger Causality between futures_btc_close_price and log_returns
/Users/alperenunal/anaconda3/lib/python3.11/site-packages/statsmodels/tsa/stattools.py:1488: FutureWarning: verbose is deprecated since functions should not print results warnings.warn(
Testing Granger Causality between futures_btc_coin_volume and log_returns
/Users/alperenunal/anaconda3/lib/python3.11/site-packages/statsmodels/tsa/stattools.py:1488: FutureWarning: verbose is deprecated since functions should not print results warnings.warn(
Testing Granger Causality between futures_btc_coin_open_interest_close and log_returns
/Users/alperenunal/anaconda3/lib/python3.11/site-packages/statsmodels/tsa/stattools.py:1488: FutureWarning: verbose is deprecated since functions should not print results warnings.warn(
Testing Granger Causality between futures_btc_funding_rate and log_returns
/Users/alperenunal/anaconda3/lib/python3.11/site-packages/statsmodels/tsa/stattools.py:1488: FutureWarning: verbose is deprecated since functions should not print results warnings.warn(
Testing Granger Causality between futures_btc_CVD and log_returns
/Users/alperenunal/anaconda3/lib/python3.11/site-packages/statsmodels/tsa/stattools.py:1488: FutureWarning: verbose is deprecated since functions should not print results warnings.warn(
Testing Granger Causality between futures_eth_close_price and log_returns
/Users/alperenunal/anaconda3/lib/python3.11/site-packages/statsmodels/tsa/stattools.py:1488: FutureWarning: verbose is deprecated since functions should not print results warnings.warn(
Testing Granger Causality between futures_eth_coin_open_interest_close and log_returns
/Users/alperenunal/anaconda3/lib/python3.11/site-packages/statsmodels/tsa/stattools.py:1488: FutureWarning: verbose is deprecated since functions should not print results warnings.warn(
Testing Granger Causality between futures_eth_funding_rate and log_returns
/Users/alperenunal/anaconda3/lib/python3.11/site-packages/statsmodels/tsa/stattools.py:1488: FutureWarning: verbose is deprecated since functions should not print results warnings.warn(
Testing Granger Causality between futures_eth_CVD and log_returns
/Users/alperenunal/anaconda3/lib/python3.11/site-packages/statsmodels/tsa/stattools.py:1488: FutureWarning: verbose is deprecated since functions should not print results warnings.warn(
Testing Granger Causality between btc_futures_to_spot and log_returns
/Users/alperenunal/anaconda3/lib/python3.11/site-packages/statsmodels/tsa/stattools.py:1488: FutureWarning: verbose is deprecated since functions should not print results warnings.warn(
Testing Granger Causality between eth_futures_to_spot and log_returns
/Users/alperenunal/anaconda3/lib/python3.11/site-packages/statsmodels/tsa/stattools.py:1488: FutureWarning: verbose is deprecated since functions should not print results warnings.warn(
Testing Granger Causality between volatility and log_returns
/Users/alperenunal/anaconda3/lib/python3.11/site-packages/statsmodels/tsa/stattools.py:1488: FutureWarning: verbose is deprecated since functions should not print results warnings.warn(
Feature: spot_btc_coin_volume Lag 1 p-value: 0.1100 Lag 2 p-value: 0.2437 Lag 3 p-value: 0.2708 Lag 4 p-value: 0.2473 Lag 5 p-value: 0.2233 Lag 6 p-value: 0.3123 Lag 7 p-value: 0.3271 Lag 8 p-value: 0.4850 Lag 9 p-value: 0.5784 Lag 10 p-value: 0.6444 Lag 11 p-value: 0.6729 Lag 12 p-value: 0.7282 Lag 13 p-value: 0.7914 Lag 14 p-value: 0.8389 Lag 15 p-value: 0.6477 Lag 16 p-value: 0.6497 Lag 17 p-value: 0.6867 Lag 18 p-value: 0.7526 Lag 19 p-value: 0.8024 Lag 20 p-value: 0.7823 Lag 21 p-value: 0.5659 Lag 22 p-value: 0.6363 Lag 23 p-value: 0.6989 Lag 24 p-value: 0.7574 Lag 25 p-value: 0.8217 Lag 26 p-value: 0.8544 Lag 27 p-value: 0.8431 Lag 28 p-value: 0.8262 Lag 29 p-value: 0.8653 Lag 30 p-value: 0.8859 Lag 31 p-value: 0.8449 Lag 32 p-value: 0.8616 Lag 33 p-value: 0.8965 Lag 34 p-value: 0.9101 Lag 35 p-value: 0.9320 Lag 36 p-value: 0.9378 Lag 37 p-value: 0.8966 Lag 38 p-value: 0.9089 Lag 39 p-value: 0.8889 Lag 40 p-value: 0.8967 Lag 41 p-value: 0.8529 Lag 42 p-value: 0.8342 Lag 43 p-value: 0.8836 Lag 44 p-value: 0.9135 Lag 45 p-value: 0.9114 Lag 46 p-value: 0.8396 Lag 47 p-value: 0.8471 Lag 48 p-value: 0.8665 Lag 49 p-value: 0.8428 Lag 50 p-value: 0.8557 Lag 51 p-value: 0.8575 Lag 52 p-value: 0.8486 Lag 53 p-value: 0.8708 Lag 54 p-value: 0.8837 Lag 55 p-value: 0.8899 Lag 56 p-value: 0.9088 Lag 57 p-value: 0.8241 Lag 58 p-value: 0.6907 Lag 59 p-value: 0.7213 Lag 60 p-value: 0.7576 Feature: futures_btc_close_price Lag 1 p-value: 0.2746 Lag 2 p-value: 0.3812 Lag 3 p-value: 0.4778 Lag 4 p-value: 0.5367 Lag 5 p-value: 0.6647 Lag 6 p-value: 0.7716 Lag 7 p-value: 0.8712 Lag 8 p-value: 0.9376 Lag 9 p-value: 0.9542 Lag 10 p-value: 0.9221 Lag 11 p-value: 0.9230 Lag 12 p-value: 0.9614 Lag 13 p-value: 0.8194 Lag 14 p-value: 0.8145 Lag 15 p-value: 0.8175 Lag 16 p-value: 0.8418 Lag 17 p-value: 0.8597 Lag 18 p-value: 0.8533 Lag 19 p-value: 0.8968 Lag 20 p-value: 0.9116 Lag 21 p-value: 0.8691 Lag 22 p-value: 0.7436 Lag 23 p-value: 0.7856 Lag 24 p-value: 0.8460 Lag 25 p-value: 0.8508 Lag 26 p-value: 0.8769 Lag 27 p-value: 0.9086 Lag 28 p-value: 0.7599 Lag 29 p-value: 0.7892 Lag 30 p-value: 0.8281 Lag 31 p-value: 0.8472 Lag 32 p-value: 0.8189 Lag 33 p-value: 0.7344 Lag 34 p-value: 0.7609 Lag 35 p-value: 0.7958 Lag 36 p-value: 0.8238 Lag 37 p-value: 0.8636 Lag 38 p-value: 0.8899 Lag 39 p-value: 0.4638 Lag 40 p-value: 0.4584 Lag 41 p-value: 0.4924 Lag 42 p-value: 0.5440 Lag 43 p-value: 0.5626 Lag 44 p-value: 0.6250 Lag 45 p-value: 0.6264 Lag 46 p-value: 0.6943 Lag 47 p-value: 0.7046 Lag 48 p-value: 0.7412 Lag 49 p-value: 0.6558 Lag 50 p-value: 0.6899 Lag 51 p-value: 0.7317 Lag 52 p-value: 0.7552 Lag 53 p-value: 0.7312 Lag 54 p-value: 0.7668 Lag 55 p-value: 0.8048 Lag 56 p-value: 0.7975 Lag 57 p-value: 0.7770 Lag 58 p-value: 0.8140 Lag 59 p-value: 0.7875 Lag 60 p-value: 0.8467 Feature: futures_btc_coin_volume Lag 1 p-value: 0.1436 Lag 2 p-value: 0.3407 Lag 3 p-value: 0.4779 Lag 4 p-value: 0.4546 Lag 5 p-value: 0.2064 Lag 6 p-value: 0.3042 Lag 7 p-value: 0.4201 Lag 8 p-value: 0.5701 Lag 9 p-value: 0.6383 Lag 10 p-value: 0.6826 Lag 11 p-value: 0.4844 Lag 12 p-value: 0.5019 Lag 13 p-value: 0.4932 Lag 14 p-value: 0.5174 Lag 15 p-value: 0.5698 Lag 16 p-value: 0.5338 Lag 17 p-value: 0.5183 Lag 18 p-value: 0.5485 Lag 19 p-value: 0.5817 Lag 20 p-value: 0.6196 Lag 21 p-value: 0.6071 Lag 22 p-value: 0.6988 Lag 23 p-value: 0.7565 Lag 24 p-value: 0.8000 Lag 25 p-value: 0.8140 Lag 26 p-value: 0.8431 Lag 27 p-value: 0.8388 Lag 28 p-value: 0.8295 Lag 29 p-value: 0.8535 Lag 30 p-value: 0.8778 Lag 31 p-value: 0.8723 Lag 32 p-value: 0.8863 Lag 33 p-value: 0.8603 Lag 34 p-value: 0.8776 Lag 35 p-value: 0.9032 Lag 36 p-value: 0.9204 Lag 37 p-value: 0.9012 Lag 38 p-value: 0.9105 Lag 39 p-value: 0.8978 Lag 40 p-value: 0.8551 Lag 41 p-value: 0.7984 Lag 42 p-value: 0.8156 Lag 43 p-value: 0.8310 Lag 44 p-value: 0.8383 Lag 45 p-value: 0.8630 Lag 46 p-value: 0.8278 Lag 47 p-value: 0.8222 Lag 48 p-value: 0.8499 Lag 49 p-value: 0.7990 Lag 50 p-value: 0.7788 Lag 51 p-value: 0.8141 Lag 52 p-value: 0.8138 Lag 53 p-value: 0.8434 Lag 54 p-value: 0.8208 Lag 55 p-value: 0.8514 Lag 56 p-value: 0.8745 Lag 57 p-value: 0.7344 Lag 58 p-value: 0.5931 Lag 59 p-value: 0.6270 Lag 60 p-value: 0.6491 Feature: futures_btc_coin_open_interest_close Lag 1 p-value: 0.5453 Lag 2 p-value: 0.8640 Lag 3 p-value: 0.3737 Lag 4 p-value: 0.3737 Lag 5 p-value: 0.4476 Lag 6 p-value: 0.5005 Lag 7 p-value: 0.4952 Lag 8 p-value: 0.4601 Lag 9 p-value: 0.5522 Lag 10 p-value: 0.5982 Lag 11 p-value: 0.6913 Lag 12 p-value: 0.6537 Lag 13 p-value: 0.7011 Lag 14 p-value: 0.6622 Lag 15 p-value: 0.7352 Lag 16 p-value: 0.7185 Lag 17 p-value: 0.7655 Lag 18 p-value: 0.8139 Lag 19 p-value: 0.7565 Lag 20 p-value: 0.8089 Lag 21 p-value: 0.8051 Lag 22 p-value: 0.8477 Lag 23 p-value: 0.8678 Lag 24 p-value: 0.9309 Lag 25 p-value: 0.9574 Lag 26 p-value: 0.9672 Lag 27 p-value: 0.9494 Lag 28 p-value: 0.9453 Lag 29 p-value: 0.9564 Lag 30 p-value: 0.9622 Lag 31 p-value: 0.9743 Lag 32 p-value: 0.9783 Lag 33 p-value: 0.9636 Lag 34 p-value: 0.9702 Lag 35 p-value: 0.9752 Lag 36 p-value: 0.9768 Lag 37 p-value: 0.8407 Lag 38 p-value: 0.8697 Lag 39 p-value: 0.8315 Lag 40 p-value: 0.8779 Lag 41 p-value: 0.8944 Lag 42 p-value: 0.9070 Lag 43 p-value: 0.9192 Lag 44 p-value: 0.9142 Lag 45 p-value: 0.9041 Lag 46 p-value: 0.9199 Lag 47 p-value: 0.9157 Lag 48 p-value: 0.8806 Lag 49 p-value: 0.8608 Lag 50 p-value: 0.8142 Lag 51 p-value: 0.7491 Lag 52 p-value: 0.7680 Lag 53 p-value: 0.7719 Lag 54 p-value: 0.7880 Lag 55 p-value: 0.7668 Lag 56 p-value: 0.7877 Lag 57 p-value: 0.8189 Lag 58 p-value: 0.7240 Lag 59 p-value: 0.7226 Lag 60 p-value: 0.7631 Feature: futures_btc_funding_rate Lag 1 p-value: 0.3505 Lag 2 p-value: 0.5591 Lag 3 p-value: 0.1187 Lag 4 p-value: 0.0351 Lag 5 p-value: 0.0337 Lag 6 p-value: 0.0262 Lag 7 p-value: 0.0504 Lag 8 p-value: 0.0743 Lag 9 p-value: 0.0291 Lag 10 p-value: 0.0436 Lag 11 p-value: 0.0637 Lag 12 p-value: 0.0991 Lag 13 p-value: 0.1172 Lag 14 p-value: 0.1308 Lag 15 p-value: 0.0803 Lag 16 p-value: 0.1307 Lag 17 p-value: 0.2270 Lag 18 p-value: 0.2457 Lag 19 p-value: 0.3026 Lag 20 p-value: 0.2081 Lag 21 p-value: 0.2055 Lag 22 p-value: 0.2092 Lag 23 p-value: 0.1964 Lag 24 p-value: 0.1575 Lag 25 p-value: 0.0796 Lag 26 p-value: 0.0536 Lag 27 p-value: 0.0812 Lag 28 p-value: 0.1050 Lag 29 p-value: 0.1358 Lag 30 p-value: 0.2075 Lag 31 p-value: 0.3085 Lag 32 p-value: 0.2376 Lag 33 p-value: 0.2997 Lag 34 p-value: 0.2200 Lag 35 p-value: 0.2211 Lag 36 p-value: 0.1991 Lag 37 p-value: 0.2184 Lag 38 p-value: 0.2266 Lag 39 p-value: 0.0857 Lag 40 p-value: 0.0342 Lag 41 p-value: 0.0243 Lag 42 p-value: 0.0425 Lag 43 p-value: 0.0844 Lag 44 p-value: 0.0879 Lag 45 p-value: 0.1474 Lag 46 p-value: 0.0766 Lag 47 p-value: 0.0734 Lag 48 p-value: 0.0749 Lag 49 p-value: 0.1064 Lag 50 p-value: 0.1526 Lag 51 p-value: 0.1219 Lag 52 p-value: 0.1498 Lag 53 p-value: 0.1546 Lag 54 p-value: 0.1659 Lag 55 p-value: 0.1414 Lag 56 p-value: 0.1115 Lag 57 p-value: 0.1710 Lag 58 p-value: 0.1648 Lag 59 p-value: 0.1842 Lag 60 p-value: 0.1178 Feature: futures_btc_CVD Lag 1 p-value: 0.5976 Lag 2 p-value: 0.0075 Lag 3 p-value: 0.0101 Lag 4 p-value: 0.0152 Lag 5 p-value: 0.0224 Lag 6 p-value: 0.0313 Lag 7 p-value: 0.0498 Lag 8 p-value: 0.0360 Lag 9 p-value: 0.0479 Lag 10 p-value: 0.0710 Lag 11 p-value: 0.1271 Lag 12 p-value: 0.1946 Lag 13 p-value: 0.0838 Lag 14 p-value: 0.1101 Lag 15 p-value: 0.1291 Lag 16 p-value: 0.1452 Lag 17 p-value: 0.1687 Lag 18 p-value: 0.1887 Lag 19 p-value: 0.1348 Lag 20 p-value: 0.1337 Lag 21 p-value: 0.0701 Lag 22 p-value: 0.1212 Lag 23 p-value: 0.1927 Lag 24 p-value: 0.2970 Lag 25 p-value: 0.3434 Lag 26 p-value: 0.3660 Lag 27 p-value: 0.3826 Lag 28 p-value: 0.2675 Lag 29 p-value: 0.2921 Lag 30 p-value: 0.3775 Lag 31 p-value: 0.4245 Lag 32 p-value: 0.5121 Lag 33 p-value: 0.4879 Lag 34 p-value: 0.5200 Lag 35 p-value: 0.5519 Lag 36 p-value: 0.6156 Lag 37 p-value: 0.4833 Lag 38 p-value: 0.5252 Lag 39 p-value: 0.5686 Lag 40 p-value: 0.4574 Lag 41 p-value: 0.4621 Lag 42 p-value: 0.4537 Lag 43 p-value: 0.4125 Lag 44 p-value: 0.4489 Lag 45 p-value: 0.4455 Lag 46 p-value: 0.5604 Lag 47 p-value: 0.5811 Lag 48 p-value: 0.5994 Lag 49 p-value: 0.4792 Lag 50 p-value: 0.5075 Lag 51 p-value: 0.4834 Lag 52 p-value: 0.4108 Lag 53 p-value: 0.4551 Lag 54 p-value: 0.4996 Lag 55 p-value: 0.5623 Lag 56 p-value: 0.5141 Lag 57 p-value: 0.5895 Lag 58 p-value: 0.5617 Lag 59 p-value: 0.5606 Lag 60 p-value: 0.6271 Feature: futures_eth_close_price Lag 1 p-value: 0.0081 Lag 2 p-value: 0.0019 Lag 3 p-value: 0.0084 Lag 4 p-value: 0.0128 Lag 5 p-value: 0.0149 Lag 6 p-value: 0.0012 Lag 7 p-value: 0.0000 Lag 8 p-value: 0.0001 Lag 9 p-value: 0.0001 Lag 10 p-value: 0.0000 Lag 11 p-value: 0.0001 Lag 12 p-value: 0.0002 Lag 13 p-value: 0.0003 Lag 14 p-value: 0.0006 Lag 15 p-value: 0.0010 Lag 16 p-value: 0.0012 Lag 17 p-value: 0.0012 Lag 18 p-value: 0.0020 Lag 19 p-value: 0.0031 Lag 20 p-value: 0.0046 Lag 21 p-value: 0.0040 Lag 22 p-value: 0.0079 Lag 23 p-value: 0.0061 Lag 24 p-value: 0.0065 Lag 25 p-value: 0.0076 Lag 26 p-value: 0.0035 Lag 27 p-value: 0.0049 Lag 28 p-value: 0.0020 Lag 29 p-value: 0.0032 Lag 30 p-value: 0.0054 Lag 31 p-value: 0.0032 Lag 32 p-value: 0.0043 Lag 33 p-value: 0.0037 Lag 34 p-value: 0.0036 Lag 35 p-value: 0.0029 Lag 36 p-value: 0.0029 Lag 37 p-value: 0.0024 Lag 38 p-value: 0.0021 Lag 39 p-value: 0.0002 Lag 40 p-value: 0.0003 Lag 41 p-value: 0.0005 Lag 42 p-value: 0.0007 Lag 43 p-value: 0.0007 Lag 44 p-value: 0.0007 Lag 45 p-value: 0.0015 Lag 46 p-value: 0.0020 Lag 47 p-value: 0.0021 Lag 48 p-value: 0.0024 Lag 49 p-value: 0.0035 Lag 50 p-value: 0.0047 Lag 51 p-value: 0.0073 Lag 52 p-value: 0.0060 Lag 53 p-value: 0.0075 Lag 54 p-value: 0.0100 Lag 55 p-value: 0.0135 Lag 56 p-value: 0.0169 Lag 57 p-value: 0.0163 Lag 58 p-value: 0.0164 Lag 59 p-value: 0.0205 Lag 60 p-value: 0.0268 Feature: futures_eth_coin_open_interest_close Lag 1 p-value: 0.4469 Lag 2 p-value: 0.6873 Lag 3 p-value: 0.4037 Lag 4 p-value: 0.5607 Lag 5 p-value: 0.5075 Lag 6 p-value: 0.5115 Lag 7 p-value: 0.4356 Lag 8 p-value: 0.4328 Lag 9 p-value: 0.2593 Lag 10 p-value: 0.3342 Lag 11 p-value: 0.4255 Lag 12 p-value: 0.4435 Lag 13 p-value: 0.2846 Lag 14 p-value: 0.2344 Lag 15 p-value: 0.2592 Lag 16 p-value: 0.3056 Lag 17 p-value: 0.3700 Lag 18 p-value: 0.4227 Lag 19 p-value: 0.5078 Lag 20 p-value: 0.5663 Lag 21 p-value: 0.5035 Lag 22 p-value: 0.5057 Lag 23 p-value: 0.5013 Lag 24 p-value: 0.5682 Lag 25 p-value: 0.5286 Lag 26 p-value: 0.4716 Lag 27 p-value: 0.4971 Lag 28 p-value: 0.6318 Lag 29 p-value: 0.6744 Lag 30 p-value: 0.7401 Lag 31 p-value: 0.7805 Lag 32 p-value: 0.8043 Lag 33 p-value: 0.8233 Lag 34 p-value: 0.8595 Lag 35 p-value: 0.8469 Lag 36 p-value: 0.7494 Lag 37 p-value: 0.7334 Lag 38 p-value: 0.7504 Lag 39 p-value: 0.6558 Lag 40 p-value: 0.7294 Lag 41 p-value: 0.6791 Lag 42 p-value: 0.7496 Lag 43 p-value: 0.7534 Lag 44 p-value: 0.6156 Lag 45 p-value: 0.6436 Lag 46 p-value: 0.6596 Lag 47 p-value: 0.6263 Lag 48 p-value: 0.6432 Lag 49 p-value: 0.6102 Lag 50 p-value: 0.4923 Lag 51 p-value: 0.5015 Lag 52 p-value: 0.5524 Lag 53 p-value: 0.5960 Lag 54 p-value: 0.6293 Lag 55 p-value: 0.6562 Lag 56 p-value: 0.6675 Lag 57 p-value: 0.7158 Lag 58 p-value: 0.7311 Lag 59 p-value: 0.7383 Lag 60 p-value: 0.7110 Feature: futures_eth_funding_rate Lag 1 p-value: 0.9624 Lag 2 p-value: 0.9115 Lag 3 p-value: 0.9761 Lag 4 p-value: 0.9680 Lag 5 p-value: 0.8850 Lag 6 p-value: 0.8340 Lag 7 p-value: 0.7901 Lag 8 p-value: 0.8583 Lag 9 p-value: 0.8878 Lag 10 p-value: 0.4411 Lag 11 p-value: 0.6033 Lag 12 p-value: 0.6428 Lag 13 p-value: 0.6781 Lag 14 p-value: 0.5776 Lag 15 p-value: 0.6318 Lag 16 p-value: 0.5983 Lag 17 p-value: 0.8144 Lag 18 p-value: 0.8891 Lag 19 p-value: 0.8637 Lag 20 p-value: 0.9172 Lag 21 p-value: 0.8225 Lag 22 p-value: 0.7461 Lag 23 p-value: 0.7134 Lag 24 p-value: 0.7495 Lag 25 p-value: 0.5421 Lag 26 p-value: 0.5696 Lag 27 p-value: 0.6361 Lag 28 p-value: 0.8729 Lag 29 p-value: 0.8692 Lag 30 p-value: 0.8546 Lag 31 p-value: 0.7902 Lag 32 p-value: 0.8228 Lag 33 p-value: 0.8552 Lag 34 p-value: 0.5951 Lag 35 p-value: 0.5982 Lag 36 p-value: 0.5536 Lag 37 p-value: 0.3794 Lag 38 p-value: 0.4086 Lag 39 p-value: 0.1393 Lag 40 p-value: 0.0929 Lag 41 p-value: 0.1013 Lag 42 p-value: 0.1301 Lag 43 p-value: 0.1471 Lag 44 p-value: 0.1440 Lag 45 p-value: 0.1333 Lag 46 p-value: 0.1641 Lag 47 p-value: 0.1788 Lag 48 p-value: 0.1509 Lag 49 p-value: 0.1066 Lag 50 p-value: 0.0998 Lag 51 p-value: 0.1032 Lag 52 p-value: 0.1742 Lag 53 p-value: 0.1979 Lag 54 p-value: 0.2163 Lag 55 p-value: 0.0633 Lag 56 p-value: 0.0128 Lag 57 p-value: 0.0053 Lag 58 p-value: 0.0074 Lag 59 p-value: 0.0033 Lag 60 p-value: 0.0046 Feature: futures_eth_CVD Lag 1 p-value: 0.3464 Lag 2 p-value: 0.5830 Lag 3 p-value: 0.5763 Lag 4 p-value: 0.6658 Lag 5 p-value: 0.5733 Lag 6 p-value: 0.6758 Lag 7 p-value: 0.6759 Lag 8 p-value: 0.7565 Lag 9 p-value: 0.6783 Lag 10 p-value: 0.6785 Lag 11 p-value: 0.7666 Lag 12 p-value: 0.8337 Lag 13 p-value: 0.6556 Lag 14 p-value: 0.7247 Lag 15 p-value: 0.4632 Lag 16 p-value: 0.4452 Lag 17 p-value: 0.5039 Lag 18 p-value: 0.4112 Lag 19 p-value: 0.3904 Lag 20 p-value: 0.4919 Lag 21 p-value: 0.3466 Lag 22 p-value: 0.4447 Lag 23 p-value: 0.4339 Lag 24 p-value: 0.5047 Lag 25 p-value: 0.6436 Lag 26 p-value: 0.5273 Lag 27 p-value: 0.4942 Lag 28 p-value: 0.4062 Lag 29 p-value: 0.4286 Lag 30 p-value: 0.4054 Lag 31 p-value: 0.4130 Lag 32 p-value: 0.4037 Lag 33 p-value: 0.4099 Lag 34 p-value: 0.3763 Lag 35 p-value: 0.3808 Lag 36 p-value: 0.4219 Lag 37 p-value: 0.2447 Lag 38 p-value: 0.2733 Lag 39 p-value: 0.2351 Lag 40 p-value: 0.2387 Lag 41 p-value: 0.2437 Lag 42 p-value: 0.2395 Lag 43 p-value: 0.1698 Lag 44 p-value: 0.1809 Lag 45 p-value: 0.2227 Lag 46 p-value: 0.2157 Lag 47 p-value: 0.2282 Lag 48 p-value: 0.2393 Lag 49 p-value: 0.1064 Lag 50 p-value: 0.1017 Lag 51 p-value: 0.0837 Lag 52 p-value: 0.1100 Lag 53 p-value: 0.1271 Lag 54 p-value: 0.1490 Lag 55 p-value: 0.1670 Lag 56 p-value: 0.1873 Lag 57 p-value: 0.2392 Lag 58 p-value: 0.2833 Lag 59 p-value: 0.3059 Lag 60 p-value: 0.3448 Feature: btc_futures_to_spot Lag 1 p-value: 0.3058 Lag 2 p-value: 0.3870 Lag 3 p-value: 0.0281 Lag 4 p-value: 0.0061 Lag 5 p-value: 0.0104 Lag 6 p-value: 0.0203 Lag 7 p-value: 0.0327 Lag 8 p-value: 0.0290 Lag 9 p-value: 0.0517 Lag 10 p-value: 0.0310 Lag 11 p-value: 0.0305 Lag 12 p-value: 0.0383 Lag 13 p-value: 0.0508 Lag 14 p-value: 0.0511 Lag 15 p-value: 0.0730 Lag 16 p-value: 0.0913 Lag 17 p-value: 0.1158 Lag 18 p-value: 0.1487 Lag 19 p-value: 0.1729 Lag 20 p-value: 0.2124 Lag 21 p-value: 0.1288 Lag 22 p-value: 0.1265 Lag 23 p-value: 0.1794 Lag 24 p-value: 0.1729 Lag 25 p-value: 0.1220 Lag 26 p-value: 0.1167 Lag 27 p-value: 0.1286 Lag 28 p-value: 0.2007 Lag 29 p-value: 0.2193 Lag 30 p-value: 0.2541 Lag 31 p-value: 0.2132 Lag 32 p-value: 0.2654 Lag 33 p-value: 0.2509 Lag 34 p-value: 0.2920 Lag 35 p-value: 0.3414 Lag 36 p-value: 0.1518 Lag 37 p-value: 0.1632 Lag 38 p-value: 0.1009 Lag 39 p-value: 0.1241 Lag 40 p-value: 0.1041 Lag 41 p-value: 0.1303 Lag 42 p-value: 0.1564 Lag 43 p-value: 0.1886 Lag 44 p-value: 0.1955 Lag 45 p-value: 0.2440 Lag 46 p-value: 0.2960 Lag 47 p-value: 0.3226 Lag 48 p-value: 0.3115 Lag 49 p-value: 0.2986 Lag 50 p-value: 0.2799 Lag 51 p-value: 0.2916 Lag 52 p-value: 0.3082 Lag 53 p-value: 0.3335 Lag 54 p-value: 0.3447 Lag 55 p-value: 0.3628 Lag 56 p-value: 0.3166 Lag 57 p-value: 0.3298 Lag 58 p-value: 0.3478 Lag 59 p-value: 0.3103 Lag 60 p-value: 0.1863 Feature: eth_futures_to_spot Lag 1 p-value: 0.7094 Lag 2 p-value: 0.7336 Lag 3 p-value: 0.8467 Lag 4 p-value: 0.8515 Lag 5 p-value: 0.9172 Lag 6 p-value: 0.0107 Lag 7 p-value: 0.0057 Lag 8 p-value: 0.0117 Lag 9 p-value: 0.0033 Lag 10 p-value: 0.0002 Lag 11 p-value: 0.0002 Lag 12 p-value: 0.0002 Lag 13 p-value: 0.0003 Lag 14 p-value: 0.0005 Lag 15 p-value: 0.0009 Lag 16 p-value: 0.0050 Lag 17 p-value: 0.0131 Lag 18 p-value: 0.0143 Lag 19 p-value: 0.0217 Lag 20 p-value: 0.0331 Lag 21 p-value: 0.0176 Lag 22 p-value: 0.0094 Lag 23 p-value: 0.0020 Lag 24 p-value: 0.0004 Lag 25 p-value: 0.0001 Lag 26 p-value: 0.0001 Lag 27 p-value: 0.0001 Lag 28 p-value: 0.0046 Lag 29 p-value: 0.0038 Lag 30 p-value: 0.0081 Lag 31 p-value: 0.0030 Lag 32 p-value: 0.0044 Lag 33 p-value: 0.0045 Lag 34 p-value: 0.0110 Lag 35 p-value: 0.0159 Lag 36 p-value: 0.0208 Lag 37 p-value: 0.0066 Lag 38 p-value: 0.0040 Lag 39 p-value: 0.0013 Lag 40 p-value: 0.0012 Lag 41 p-value: 0.0008 Lag 42 p-value: 0.0017 Lag 43 p-value: 0.0012 Lag 44 p-value: 0.0017 Lag 45 p-value: 0.0020 Lag 46 p-value: 0.0009 Lag 47 p-value: 0.0010 Lag 48 p-value: 0.0014 Lag 49 p-value: 0.0013 Lag 50 p-value: 0.0012 Lag 51 p-value: 0.0006 Lag 52 p-value: 0.0007 Lag 53 p-value: 0.0005 Lag 54 p-value: 0.0006 Lag 55 p-value: 0.0004 Lag 56 p-value: 0.0004 Lag 57 p-value: 0.0003 Lag 58 p-value: 0.0004 Lag 59 p-value: 0.0004 Lag 60 p-value: 0.0007 Feature: volatility Lag 1 p-value: 0.5092 Lag 2 p-value: 0.5890 Lag 3 p-value: 0.7529 Lag 4 p-value: 0.7826 Lag 5 p-value: 0.8958 Lag 6 p-value: 0.7576 Lag 7 p-value: 0.8339 Lag 8 p-value: 0.8572 Lag 9 p-value: 0.8375 Lag 10 p-value: 0.9330 Lag 11 p-value: 0.8621 Lag 12 p-value: 0.8481 Lag 13 p-value: 0.8001 Lag 14 p-value: 0.8127 Lag 15 p-value: 0.8304 Lag 16 p-value: 0.7989 Lag 17 p-value: 0.8134 Lag 18 p-value: 0.8192 Lag 19 p-value: 0.8236 Lag 20 p-value: 0.8686 Lag 21 p-value: 0.8893 Lag 22 p-value: 0.9258 Lag 23 p-value: 0.9597 Lag 24 p-value: 0.9344 Lag 25 p-value: 0.9256 Lag 26 p-value: 0.9394 Lag 27 p-value: 0.9287 Lag 28 p-value: 0.9527 Lag 29 p-value: 0.9690 Lag 30 p-value: 0.9570 Lag 31 p-value: 0.9334 Lag 32 p-value: 0.9440 Lag 33 p-value: 0.8974 Lag 34 p-value: 0.9239 Lag 35 p-value: 0.9459 Lag 36 p-value: 0.9415 Lag 37 p-value: 0.8803 Lag 38 p-value: 0.7685 Lag 39 p-value: 0.3883 Lag 40 p-value: 0.2874 Lag 41 p-value: 0.2169 Lag 42 p-value: 0.2537 Lag 43 p-value: 0.2532 Lag 44 p-value: 0.3045 Lag 45 p-value: 0.3585 Lag 46 p-value: 0.3777 Lag 47 p-value: 0.3527 Lag 48 p-value: 0.3333 Lag 49 p-value: 0.3233 Lag 50 p-value: 0.1929 Lag 51 p-value: 0.2233 Lag 52 p-value: 0.2828 Lag 53 p-value: 0.2689 Lag 54 p-value: 0.2940 Lag 55 p-value: 0.2815 Lag 56 p-value: 0.2286 Lag 57 p-value: 0.2735 Lag 58 p-value: 0.1568 Lag 59 p-value: 0.1522 Lag 60 p-value: 0.1123
The analysis will continue to be done with daily data, as in Granger Causality.
# Lagged cross-correlation function
def lagged_cross_correlation(x, y, max_lag):
result = [x.corr(y.shift(lag)) for lag in range(-max_lag, max_lag + 1)]
return np.array(result)
# Parameters
max_lag = 60 # Considering 60 lags as in the Granger causality test
target = 'log_returns'
features = ['spot_btc_coin_volume', 'futures_btc_coin_volume',
'futures_btc_coin_open_interest_close', 'futures_btc_funding_rate',
'futures_btc_CVD', 'futures_eth_coin_open_interest_close',
'futures_eth_funding_rate', 'futures_eth_CVD',
'btc_futures_to_spot', 'eth_futures_to_spot', 'futures_btc_close_price',
'volatility']
# Performing Lagged Cross-Correlation Analysis
correlation_results = {}
for feature in features:
cross_corr = lagged_cross_correlation(predictions_daily[feature], predictions_daily[target], max_lag)
correlation_results[feature] = cross_corr
# Plotting the results
plt.figure(figsize=(8, 4))
plt.plot(range(-max_lag, max_lag + 1), cross_corr)
plt.title(f'Lagged Cross-Correlation between {feature} and {target}')
plt.xlabel('Lag')
plt.ylabel('Cross-Correlation')
plt.axhline(0, color='black', linestyle='--', linewidth=1)
plt.show()
# Reviewing the results
correlation_results_df = pd.DataFrame(correlation_results, index=range(-max_lag, max_lag + 1))
correlation_results_df
| spot_btc_coin_volume | futures_btc_coin_volume | futures_btc_coin_open_interest_close | futures_btc_funding_rate | futures_btc_CVD | futures_eth_coin_open_interest_close | futures_eth_funding_rate | futures_eth_CVD | btc_futures_to_spot | eth_futures_to_spot | futures_btc_close_price | volatility | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| -60 | 0.014233 | 0.033137 | 0.016550 | -0.026041 | -0.035220 | 0.032092 | -0.015272 | -0.012210 | -0.077827 | -0.003179 | -0.050450 | -0.000453 |
| -59 | 0.013942 | 0.025818 | 0.016773 | -0.032339 | -0.035309 | 0.033034 | -0.007351 | -0.012947 | 0.021004 | -0.009731 | -0.049629 | 0.018221 |
| -58 | 0.036573 | 0.057026 | 0.008890 | -0.003345 | -0.032296 | 0.031144 | -0.010387 | -0.009285 | -0.031482 | -0.007223 | -0.046138 | 0.026090 |
| -57 | -0.025084 | -0.046710 | 0.015462 | -0.022825 | -0.030019 | 0.031974 | -0.023196 | -0.007314 | -0.009984 | -0.008615 | -0.042655 | 0.004531 |
| -56 | 0.005471 | -0.003002 | 0.018018 | -0.021976 | -0.026078 | 0.036448 | 0.062498 | -0.002688 | 0.029085 | 0.036311 | -0.040707 | 0.007882 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 56 | 0.017418 | 0.005297 | -0.042958 | 0.070028 | -0.030914 | -0.027475 | 0.108292 | -0.017932 | 0.044458 | 0.076263 | 0.063096 | 0.017158 |
| 57 | 0.013932 | 0.011532 | -0.051296 | 0.082643 | -0.028698 | -0.029822 | 0.026879 | -0.016653 | 0.049372 | 0.070354 | 0.065027 | 0.022192 |
| 58 | 0.006604 | -0.024388 | -0.046506 | 0.083757 | -0.028028 | -0.033877 | 0.012481 | -0.016643 | 0.016932 | 0.036280 | 0.068065 | 0.032196 |
| 59 | -0.000495 | -0.031313 | -0.055574 | 0.050194 | -0.029419 | -0.035908 | 0.045028 | -0.018043 | 0.050128 | 0.059417 | 0.069290 | 0.034203 |
| 60 | -0.002976 | -0.007311 | -0.058969 | 0.033134 | -0.032349 | -0.033739 | 0.015452 | -0.019673 | -0.002414 | 0.052512 | 0.063262 | 0.039023 |
121 rows × 12 columns